Graph Convolution Network (GCN, 그래프 합성곱 신경망)

2024-08-28 2 분 소요

※ Graph Neural Network

그래프에 직접 적용할 수 있는 신경망
그래프 구조를 활용하여 특징을 추출하여 작업을 수행

■ Graph (그래프)

점(Node)들을 잇는 선(Edge)으로 이루어진 데이터 구조
일반적으로 G=(V, E)로 정의
- V: 점 집합
- E: 두 점을 잇는 선 집합

□ Adjacency Matrix (인접행렬)

각 노드가 주변 노드들과 연결되어 있으면 1, 아니면 0으로 표현된 matrix
점의 개수가 n이면 (n x n) matrix
symmetric matrix
일반적으로 sparse 함 (0이많음) $\begin{align*} A_{adj} \end{align*}$

□ Degree Matrix

각 노드가 주변 노드와 몇 개나 연결되어 있는지를 표현한 matrix
대각행렬임 $\begin{align*} \tilde D \end{align*}$

□ Node-Feature Matrix

각 노드에 정보(hidden representation)들을 모아놓은 matrix
H로 주로 표현 함 $\begin{align*} H \end{align*}$

□ Self-Connecting Edges

인접행렬 A가 있을 때, A+I를 한 것
자기 자신과 연결이 된 것
스스로를 가리키는 self-loop를 가지고 있어야, feature를 취합할 때 자신의 정보도 반영할 수 있기 때문
Adding $I$ is to add self-connecting edges $\begin{align*} A+I \end{align*}$

□ Neighborhood Normalization

노드마다 엣지를 갖고 있는 수가 다르면, 엣지가 많은 노드는 feature representation에서 큰 값을 가지고(exploding gradients), 연결이 적은 노드는 값이 작아진다(vanishing gradients).
모든 노드들이 동등하다고 생각하고, normalization한다.
- Considering neighboring nodes in the normalized weights
- To prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge
Normalized $\tilde A$ $\begin{align*} \tilde A = \tilde D^{-1/2}(A+I)\tilde D^{-1/2} \end{align*}$

■ Basics of GCN

Similar to CNN, GCN updates each node with their adjacent nodes
Unlike CNN, each node of GCN has differnt number of adjacent nodes
- Indicate adjacent nodes of each node by adjacency matrix A

Message
- 각 노드의 feature vector를 의미함
- information passed by neighboring nodes to the central node
Aggregate
- message를 모으는 과정
- collect information from neighboring nodes
Update
- embedding update by combining information from neighboring nodes and from itself

□ Message Aggregation from Local Neighborhood

Message -> Aggregate -> Update를 반복 (1 레이어)

□ Update

Update 1: Linear combination
1. W(오메가)라는 행렬을 곱함 (Trainable parameter)
Update 2: Non-linear function
1. 1에 의해 update된 행렬은 비선형성이 없으니 함수를 취함 (e.g. ReLU, Sigmoid, …)

□ Finally Graph Convolutional Networks

Similar to convolutional neural network
Multi-layer Graph Convolutional Network (GCN)

\[\begin{align*} H^{k+1} &= \sigma((\tilde D^{-1/2}(A+I) \tilde D^{-1/2})H^{k}W^{k}) \\ &= \sigma(\tilde AH^{k}W^{k}) \end{align*}\]

==================================

□ Laplacian Matrix

degree matrix - adjacency matrix
L = D - A

□ Graph Representation learning

어떻게 그래프로 표현할 것인가
기존의 복잡한 그래프를 low dimensional space로 요약하는 것

□ Unstructured Data

□ Node Embedding

고차원을 저차원의 space로 보내는 것
encode
non euclidean » euclidean

Shallow Embedding Method

similarity preserving:
- 원래 노드간 거리가 가까웠다면 embedding 된 space에서도 가까워야 한다
  - S(A,C) ~~ S(E(A), E(C))
  - e.g. Node2Vec, DeepWalk, LINE, …
문제점
- Scalability issue
  - embedding matrix E가 노드의 개수만큼 차원(컬럼)이 커짐
  - 더 큰 NN에는 적용이 힘듬
  - the size of an embedder increases linearly with the graph nodes
- difficulty in defining the similarity metric
- unable to reuse the embedder for an updated graph

Neural network-based methods

Weisfeiler-Lehman GNNs
Message passing GNN
- 우리가 갖고 있는 그래프에서 하나의 노드가 주변의 노드로부터 정보를 받아서 feature가 결정이 된다고 생각
- e.g. A라는 노드는 B, C, D에 의해서 결정이 된다
- aggregate: 정보를 어떻게 합칠것인가… (differentiable functions)

□ Aggregation □ Neighborhood and Adjacency matrix □ (Neighborhood) Attention

■ GCN

neighborhood (adjacency matrix, distance) 를 어떻게 정의할 것인가…
어떻게 합칠것인가… (aggregate)
- attention, edge weights, normalization, ordering of nodes

graph learning on point clouds

Twitter Facebook LinkedIn