The complexity of presenting data for deep learning is growing every day. Graph Neural Networks ( GNNs ) have become one of the breakthroughs of recent years. But why exactly graphs are gaining more and more popularity in machine learning?

The ultimate goal of my narrative is a general presentation of graphs in machine learning techniques. The article does not pretend to be a scientific work that fully describes the full power of graphs, but only introduces the reader to this amazing and complex world. The publication is perfect, both for battle-hardened professionals who are not yet familiar with the presentation of graphs in deep learning, and for beginners in this field.

Introduction

Automatically highlighting important features needed to solve a problem is one of the main reasons for the successful use of machine learning. But traditionally, when working with graphs, machine learning approaches have relied on user-defined heuristics to extract the coding features of structural graph information. Nevertheless, the trend of recent years has changed: approaches are increasingly emerging in which they automatically learn to code the graph structure in low-dimensional investments using methods of deep learning and non-linear dimension reduction.

In machine learning on graphs, two central problems can be distinguished: the inclusion of information about the structure of the graph in the model (i.e., a simple way of encoding this information in the feature vector) and the reduction in the dimension of the feature vector.

To extract structural information from graphs, traditional machine approaches are often based on summary statistics of graphs (e.g., clustering coefficients), kernel functions, or carefully designed functions for measuring local neighborhood structures. However, these approaches are limited, because these engineering solutions cannot be adapted in the learning process, and their development is a laborious and expensive process.

Why graphs?

Speaking of graphs as structural units of data representation, you can ask yourself a simple question: why graphs?

— . , , ( ) [1], [2], ( [3], [4] [5]), [6] [7].

CV/ML , , . , , [8].

, . . , , , .

, (embeddings), . , , , , (), $R ^ d$ . , , . , .

, . , , . , , , .

, (direct encoding), . . , , -. , , , .

DeepWalk node2vec . , . , .

( ). (DNGR SDNE), , [9].

$G = (V, E)$ , $V$ , , $E$ — ( ) , .

(.1). : $V = \\ {1,2,3,4,5,6 \\}$ $E = \\ {\\ {1, 2 \\}, \\ {1, 5 \\}, \\ {2, 3 \\}, \\ {2, 5 \\}, \\ {3 , 4 \\}, \\ {4, 5 \\}, \\ {4, 6 \\} \\}$

, , , ( , ). .

, $G = (V, E)$ $A$ . , $X ∈ R ^ {m * | V |}$ ( , ). , , $A$ $X$ , $z ∈ R ^ d$ , $d << | V |$ .

, $A$ $X$ , . , , . (, )

- , $v_i$ $z_i$ , / . ; $v_i$ ( ) , $v_i$ ( ). , .

, , , . , (, ) . 2 , , .

, — , , $z_i ∈ R ^ d$ ( $z_i$ $v_i ∈ V$ ):

t e x t b f E N C : V \to R^{d}

$\ textbf {ENC:} V → R ^ d$

— , :

t e x t b f D E C : R^{d} * R^{d} \to R^{+}

$\ textbf {DEC:} R ^ d * R ^ d → R ^ +$

. , , . , $(z_i, z_j)$ , $v_i$ $v_j$ . , :

D E C (E N C (v_{i}), E N C (v_{j})) = D E C (z_{i}, z_{j}) \approx s G (v_{i}, v_{j}) t e x t b f (1)

$DEC (ENC (v_i), ENC (v_j)) = DEC (z_i, z_j) ≈ sG (v_i, v_j) \ textbf {(1)}$

$sG$ — , $G$ . , $sG (z_i, z_i) ≜ A_i, _j$ , 1, 0. $sG$ $v_i$ $v_j$ $G$ . ( 1) $L$ $D$ :

L = s u m_{(v_{i}, v_{j}) \in D} ℓ (D E C (z_{i}, z_{j}), s G (v_{i}, v_{j}))

$L = \ sum _ {(v_i, v_j) ∈ D} ℓ (DEC (z_i, z_j), sG (v_i, v_j))$

$ℓ = R * R → R$ — , (.. ) $DEC (z_i, z_j)$ $sG (v_i, v_j)$ .

, -, , . , , , , , .

seq2seq , , . , seq2seq, GNN [10].

, :

$sG$ : $V * V → R ^ +$ , $G$ .
ENC, . , .
DEC, .
$ℓ$ , , .

. , . $pG (v_j | v_i)$ $v_j$ , $v_i$ .

, , . , , (.3). , , , .

DeepWalk node2vec, , , . , , . , , :

D E C (z_{i}, z_{j}) ≜ d f r a c e^{z_{i}^{T} z_{j}} s u m_{v_{k} \in V} e^{z_{i}^{T} z_{k}} \approx p G, T (v_{j} | v_{i}) t e x t b f (2)

$DEC (z \ _i, z \ _j) ≜ \ dfrac {e ^ {z \ _i ^ Tz \ _j}} {\ sum_ {v_k∈V} e ^ {z_i ^ Tz_k}} ≈ pG, T (v \ _j | v \ _i) \ textbf {(2)}$

$pG, T (v_j | v_i)$ — $v_j$ $T$ , $v_i$ , $T$ $T ∈ \\ {2, ..., 10 \\}$ . , $pG, T (v_j | v_i)$ . , :

ℓ = s u m_{(v_{i}, v_{j}) \in D} - l o g (D E C (z_{i}, z_{j})) t e x t b f (3)

$ℓ = \ sum _ {(v_i, v_j) ∈D} −log (DEC (z_i, z_j)) \ textbf {(3)}$

$D$ , (.. $N$ $v_i$ $(v_i, v_j) \ sim pG, T (v_j | v_j)$ . — $O (| D || V |)$ ( (2) $O (| V |)$ ). , DeepWalk node2vec (3). DeepWalk softmax , . , node2vec (3), : , , " ".

, node2vec DeepWalk , , . , node2vec : $p$ $q$ , (.4). $p$ , $q$ . , node2vec , .

A: , node2vec , p q. , $v \ _s$ $v \ _ ∗$ , (α) , .

B: , (BFS) (DFS). , BFS, , , . , , DFS, .

, . , :

(.. ). , , .
. (, ), .
. , , ( , ). , , , , .

/ . -, , , .

(DNGR) (SDNE) , : , . , — , (.5). DNGR SDNE , .

$v_i$ $s_i ∈ R ^ {| V |}$ , $v_i$ $S$ ( $S$ — $S_i, _j = sG (v_i, v_j)$ ). $s_i$ $v_i$ $v_i$ . DNGR SDNE , $s_i$ , $s_i$ :

D E C (E N C (s_{i})) = D E C (z_{i}) \approx s_{i}

$DEC (ENC (s_i)) = DEC (z_i) ≈ s_i$

, :

ℓ = s u m_{v_{i} \in V} | | D E C (z_{i}) - s_{i} | |_{2}^{2}

$ℓ = \ sum_ {v_i∈V} || DEC (z_i) - s_i || ^ 2_2$

, $z_i$ , $| V |$ ( ), , . SDNE, DNGR, : , (.5).

SDNE DNGR , $s_i$ , , . DNGR $s_i$ , , DeepWalk node2vec. SDNE $s_i ≜ Ai$ , $v_i$ .

, SDNE DNGR , ( ), - . $| V |$ , .

, . . , , , [9], . , .

. , , , , , , .

, , , .

. (. $O (| E |)$ ), ( ). . , , , , , , — , , , . , , , .

. , , - , , . , , , , .

. , , , . , , , . , .

. , , . , , . , , , . , , , , .

, — . , , .

[1] — W. L. Hamilton, Z. Ying, and J. Leskovec, "Inductive representation learning on large graphs," NIPS 2017, pp. 1024–1034, 2017.
[2] — T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," ICLR 2017, 2017.
[3] — A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia, "Graph networks as learnable physics engines for inference and control," arXiv
preprint arXiv:1806.01242, 2018.
[4] — P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende et al., "Interaction networks for learning about objects, relations and physics," in NIPS 2016, 2016, pp. 4502–4510.
[5] — A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, "Protein interface prediction using graph convolutional networks," in NIPS 2017, 2017, pp. 6530–6539.
[6] — T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto, "Knowledge transfer for out-of-knowledge-base entities: A graph neural network approach," in IJCAI 2017, 2017, pp. 1802–1808.
[7] — H. Dai, E. B. Khalil, Y. Zhang, B. Dilkina, and L. Song, "Learning combinatorial optimization algorithms over graphs," arXiv preprint arXiv:1704.01665, 2017.
[8] — X. Liang, X. Shen, J. Feng, F. Lin, S. Yan, "Semantic Object Parsing with Graph LSTM", arXiv:1603.07063v1 [cs.CV] 23 Mar 2016.
[9] — Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, Philip S. Yu, "A Comprehensive Survey on Graph Neural Networks", arXiv:1901.00596v4 [cs.LG] 4 Dec 2019.
[10] - P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, Y. Bengio, “Graph Attention Networks”, arXiv: 1710.10903v3 [stat.ML] Feb 4, 2018.

References

Graph Neural Networks: A Review of Methods and Applications
Representation Learning on Graphs: Methods and Applications

Graph theory in machine learning for the smallest

Introduction

Why graphs?

References

More articles: