Verifying Isomorphism of Two Graphs and Searching for Isomorphic Subgraphs: An Approach Based on NB-Paths Analysis

Hello everyone.

There is such a task - to check whether two graphs are isomorphic to each other. That is, to put it simply, to find out whether both of these graphs are “the same” graph, but with different vertex numbers and, if graphs are specified graphically, with different spatial locations. The solution to this problem is not so obvious as it might seem to someone at first glance: even for small graphs, a look at their graphical representation will not always give a definite answer. See, for example, a drawing on the same Wiki: en.wikipedia.org/wiki/Graph_isomorphism#Example .

Well, obviously?

But there is a more difficult task: to search in a “large” graph for all subgraphs isomorphic to some other “smaller” graph. This is an even more "dark forest." That is, of course, not entirely dark, but the task, you see, is not the easiest.

So what do we have?

1. Why is all this necessary?


And although the problem of isomorphic (sub) graphs, as we have already mentioned, is complicated, it is quite necessary and useful. What for? And then, for example, to search the database for chemical compounds similar to a given molecule by its structural formula. After all, it can be imagined as a graph, right? Chemoinformatics does this - there is such a science. But there are also tasks of comparison with the sample, various bioinformatic tasks, and many other interesting things.

2. How to solve this problem?


The classical approach to solving this problem was laid down by J. Ullman in 1976 [3], later on he substantially improved his algorithm [4]. Also, this approach was further developed in the works of many other authors, for example, Cordella and co-authors [2] (VF2 algorithm), Bonnitsi and co-authors [1] (RI algorithm), etc.

In short, the essence of this approach is “smart enumerating ”the possible correspondences of the vertices of the sample graph B and graph A in which it is being searched. This search makes it "smart" to enter various conditions and restrictions that allow you to cut off unsuitable options as early as possible. Also, for these purposes, various preliminary processing of the source data can be carried out.

We will not be engaged in retelling these works here, but we will invite the reader to get acquainted with them independently. However, “a show is better than an order”, and it should be noted that there are good graphic examples and explanations of these algorithms on the Web. See, for example, the following:
coderoad.ru/17480142/Is-or-simple-example-for-explanation-algorithm-Ulman - a very good and clear explanation with an example,
issue.life/questions/8176298 - steps of the VF2 algorithm with an example .

However, the question arises - are there any other possible ways and approaches for solving this problem, albeit for some special cases? And I would like to introduce you to one of the possible answers to this question.

2. Isomorphism of graphs and NB-paths


Let's immediately agree: under NB-paths (and not from English, but simply because it is so shorter) we will call maximal non-branching paths, i.e. maximally long unbranched paths of some graph.

So, if we have two graphs that are isomorphic to each other , then for any representation of the first graph as a sequence of maximally extended non-branching paths (i.e., these of our NB-paths) there always exists a corresponding representation of the second graph corresponding to it, and at the same time:

  • for directed graphs, the paths corresponding to each other will be aligned,
  • the degrees of the vertices corresponding to each other for undirected graphs (and for oriented graphs, the numbers of incoming and outgoing edges, respectively) will be equal to
  • when "combining" such representations as a result, we will have a correspondence of the vertices of the first and second graphs.

An example . Graph A = {1-> 2, 1-> 6, 4-> 5, 5-> 1, 3-> 3}. Graph B = {3-> 4, 3-> 5, 1-> 2, 2-> 3, 6-> 6}
PathsA ( maximal non-branching paths of A ): 1-> 2, 1-> 6, 4-> 5-> 1, 3-> 3
PathsB ( maximal non-branching paths of B ): 3-> 4, 3-> 5, 1-> 2-> 3, 6-> 6 Vertex matching
: 1 ( A): 3 (B), 2 (A): 4 (B), 6 (A): 5 (B), 4 (A): 1 (B), 5 (A): 2 (B), 3 ( A): 6 (B).

Thus, the task of checking the isomorphism of two graphs can be solved by finding such paths corresponding to each other and then checking the equality of the adjacency matrices constructed based on their conservation of the order of the vertices in the obtained path sequences (each vertex is added to the sequence once, first “mention”). In our example, the following vertex orders will be used to construct adjacency matrices:

A : 1, 2, 6, 4, 5, 3
B : 3, 4, 5, 1, 2, 6

Adjacency matrix for A for a given order of vertices:

image

Adjacency matrix for B for a given order of vertices:

image

The matrices are equal; therefore, the graphs A and B are isomorphic.

Moreover, this approach (1) is applicable for both oriented and non-oriented graphs, (2) it is applicable for graphs containing more than one connected component / strong connection, (3) it is applicable for graphs containing multiple (multiple) edges and loops, however (4) does not take into account vertices for which there are no edges incident to it.

3. Well, checked the isomorphism of graphs. But what about the search for subgraphs?


And here, frankly, everything is much more complicated. Here we will have the following restriction: based on the very essence of the method, you can find not any, but only “inscribed” subgraphs . And we mean a “subgraph” of a graph A such a subgraph that can be “glued” to other parts of A only due to edges incident only to the boundary vertices of its (subgraph) non-branching paths of maximum length (in this case, A can contain other connected components). Do not worry, below will be an example, and everything will be clearer.

In addition, when solving this problem, in addition to searching for correspondence of NB paths of graphs A and B in length (as was the case with the isomorphism test above), it is also necessary to separately search for the following possible correspondences between them:

  • Correspondence of NB paths - simple chains from graph B to NB paths - simple chains / simple cycles of graph A. Moreover, initially such paths in graph A can be longer - in this case, their segments are taken that are equal in length to the desired path from B.
  • Correspondence of the NB-“end” paths of graph B to any NB-paths of graph A (in this case, the paths from graph A can also be longer — in this case, their segments are taken that are equal in length to the desired path from graph B).

Let's look at an example:

image

When searching for an “inscribed” isomorphic subgraph of graph B in column A (see figure above), the following correspondences will be found:

  • inner path 2-> 3-> 4 of column B: inner path 2-> 3-> 4 of column A,
  • end paths 1-> 2 and 10-> 2 of column B: end path 0-> 2 of column A and a fragment of the end path 7-> 1-> 2 of column A, namely 1-> 2,
  • 7->8 B: 9->10->11 A, 9->10 10->11, 12->13->14->12 A, 12->13, 13->14, 14->12.

Thus, as “inscribed” subgraphs of graph A that are isomorphic to graph B, the following 5 options can be found:

A1 = {0-> 2, 1-> 2, 2-> 3, 3-> 4, 4-> 5, 4-> 6, 9-> 10}
A2 = {0-> 2, 1-> 2, 2-> 3, 3-> 4, 4-> 5, 4-> 6, 10-> 11}
A3 = {0-> 2, 1-> 2, 2-> 3, 3-> 4, 4-> 5, 4-> 6, 12-> 13}
A4 = {0-> 2, 1-> 2, 2 -> 3, 3-> 4, 4-> 5, 4-> 6, 13-> 14}
A5 = {0-> 2, 1-> 2, 2-> 3, 3-> 4, 4-> 5, 4-> 6, 14-> 12}

However, if we add an additional edge 3-> 8 to the graph A, we get the graph A '(below in the same figure). And in A 'there will no longer be any “inscribed” subgraphs isomorphic to the graph B. Indeed: the edge 3-> 8 “splits” the path 2-> 3-> 4 of the graph A into two and, therefore, candidate paths for the inner path 2 ->3-> 4 columns B will not be found.

4. Now the algorithm itself


Now we can move on to a more detailed consideration of the search algorithm for the subgraphs “inscribed” in column A that are isomorphic to some column B.

So, the algorithm will consist of 4 stages:

  • Preprocessing
  • Matching
  • Thinning,
  • Conclusion

Stage I. Preprocessing


At this stage, we find all NB-paths for each of the graphs, as well as evaluate the factors that limit the choice space during enumeration. Those. we do the following:

  1. We find all NB-paths in column A and put them into a dynamic array (in C ++ - into a vector) PathsA
  2. We find all NB-paths in column B and put them in a dynamic array (vector) PathsB
  3. A B. II-IV , 1. A- B B: - , B.
  4. A B ( ).
  5. A B: DA DB .
  6. – B00 – B , , .

So, we have NB-paths on both graphs and limit parameters from p.p. 3-5.

Stage II. Mapping


At this stage, we will select candidate NB paths (hereinafter simply referred to as “candidate paths”) from column A for each of the NB paths of column B. Marking each candidate path from PathsA for each ith path PathsB [i ] will be written to a two-dimensional dynamic array (in C ++ - to a vector of vectors) NPaths - respectively to the i-th vector (i-th line) - in the form of an ordered triple of numbers: the number of the corresponding path in PathsA - the number of the starting position in it - the length of the path .

For example, PathsB [2] = {1, 0, 3, 3, 1, 3} means that PathsB [2] paths are associated with 2 candidate paths from A: from PathsA [1] - the first 3 path elements starting from zero ( initial), and from PathsA [3] - also 3 elements starting from the first (next to the initial).

At the same time, we will search (select) candidate paths in 4 directions:

  1. Search for candidate paths for all internal NB paths of graph B from PathsB, i.e. those whose both boundary vertices are connected in the graph B with at least 2 other vertices (regardless of the direction of such a connection) and at the same time this path is not a simple cycle (oriented - for oriented graphs).
  2. Search for candidate paths for trailing NB paths from PathsB.
  3. Search for candidate paths for NB paths - simple loops from PathsB.
  4. Search for candidate paths for NB paths - simple paths from PathsB.

When selecting candidate paths for each ith path from PathsB, they are compared (this is where some of the previously calculated limiter parameters come in handy):

  • its length and the length of the candidate path (should be equal for cases 1 and 3, and for cases 2 and 4 the candidate path from PathsA can also be longer),
  • the degrees of its boundary vertices and the corresponding vertices of the candidate path (for vertices from the candidate path, these values ​​must be at least the path from PathsB).

If, according to the results of stage II, not a single candidate path from PathsA was found for any of the paths in PathsB, then it is considered that A does not contain “inscribed” subgraphs isomorphic to column B.

Thinning


Now let's try to “squeeze” graph A. We leave in it only those edges that went into the candidate paths we found. If at the same time, graph A has lost at least one edge in comparison with its initial state, we return to stage I “Preprocessing”: the degrees of the vertices of graph A and, accordingly, the list of candidate paths can be reduced and, accordingly, the search space can be reduced.

Stage III. Thinning the list of candidate paths


The purpose of this step is to further eliminate as many candidates as possible inappropriate. To do this, we carry out the following steps.

  1. The exclusion of obviously inappropriate candidate paths based on the minimum distances between the vertices in column B. The candidate path for each PathsB [i] for which at least one PathsB [j] no candidate path was found for the shortest the distance between their boundary vertices in column A was less than or equal to the shortest distance between the corresponding boundary vertices of the paths PathsB [i] and PathsB [j] from graph B.
  2. An exception for all cycles from PathsB to non-cyclic candidate paths associated with it and vice versa.
  3. The exception of candidate paths similar to Section 1, but not by the criterion of the shortest distance between boundary vertices, but by their (boundary vertices) to mutual equality or inequality.

For example, if:

  • the starting element of PathsB [i] is not equal to the starting element of PathsB [j], and
  • the final element of PathsB [i] is not equal to the final element of PathsB [j], and
  • the starting element of PathsB [i] is equal to the ending element of PathsB [j], and
  • the starting element of PathsB [j] is not equal to the ending element of PathsB [i],

the candidate path along the PathsB [i] path for which at least one PathsB [j] did not find any candidate paths satisfying the above condition regarding the equality / inequality of their (candidate paths) initial and final peaks.

That is, roughly speaking, we discard those candidate paths that obviously will not be included in the desired subgraphs - based on the distance to all other candidate paths (these paths are terribly far from all others), based on their cyclicity / non-cyclicity, and also based on due equality / inequality of their boundary vertices with boundary vertices of other candidate paths.

If, according to the results of stage III, at least 1 candidate path was deleted from PathsA, then - again - column A is updated - only those edges that are included in the remaining candidate paths remain in it. And, again, if at the same time, graph A “lost weight” by at least one edge, we again return to stage I “Preprocessing”: the degrees of the vertices of graph A and, accordingly, the list of candidate paths can be reduced and, accordingly, can be reduced search space.

Stage IV. Conclusion


Each possible combination of all remaining candidate paths defines a subgraph of column A. For each such subgraph, an adjacency matrix is ​​constructed taking into account the order of the candidate paths in the same way as the adjacency matrix B00 was constructed for graph B, and then these adjacency matrices are compared.

If they are equal, then the multiplicities of the edges are compared (see Sec. 3 of Stage I Preprocessing). If all these conditions are fulfilled, the found subgraph is considered isomorphic to graph B and is added to the set of results found.

During stage IV “Conclusion”, various additional checks may be used to speed up the identification and rejection of an inappropriate subgraph.

5. How everything is confused ... Consider an example of the algorithm


I don’t know about you, but it would be incomprehensible to me without an example. Let's look at an example.

In fig. the graphs below are A = {1-> 2, 2-> 5, 5-> 15, 16-> 15, 5-> 5, 5-> 5, 5-> 4, 5-> 6, 7-> 6 , 6-> 8, 6-> 6, 6-> 9, 9-> 10, 9-> 11, 12-> 0, 0-> 12, 4-> 13, 13-> 14, 14-> 4 } and B = {1-> 1, 4-> 5, 5-> 1, 1-> 3, 3-> 3, 3-> 2, 2-> 7, 2-> 8, 9-> 10, 10-> 9, 1-> 6, 6-> 11, 11-> 12, 12-> 6}. Our task is to find in graph A all the “inscribed” subgraphs isomorphic to graph B. The result is also shown in the figure: the found vertices and edges of graph A are highlighted by a thick line, and the numbers of the corresponding vertices of graph B are indicated in brackets (if there are several options, they are indicated by fraction).

image

When solving the search problem in column A of “inscribed” subgraphs isomorphic to column B,we have the following results of the algorithm.

Stage I: All actions and calculations were performed according to p.p. 1-5 of this step, the following are the resulting PathsA and PathsB.

PathsA:

1-> 2-> 5
4-> 13-> 14 4
5-> 4
5-> 5
5-> 6
5-> 15
6-> 6
6-> 8
6-> 9
7-> 6
9 -> 10
9-> 11
16-> 15
0-> 12-> 0

PathsB:

1-> 1
1-> 3
1-> 6
2-> 7
2-> 8
3-> 2
3-> 3
4- > 5-> 1
6-> 11-> 12-> 6
9-> 10-> 9

Stages II-III. The comparison showed that for each path from PathsA there is at least one candidate path from PathsB, and PathsA was shortened by the results of preliminary thinning. At the stage of main thinning (III), further reduction of PathsA did not occur. As a result, PathsA and PathsB took the form:

PathsA:

1-> 2-> 5
4-> 13-> 14-> 4
5-> 4
5-> 5
5-> 6
6-> 6
6-> 9
9-> 10
9-> 11
0-> 12-> 0

PathsB:

1-> 1
1-> 3
1-> 6
2-> 7
2-> 8
3-> 2
3-> 3
4-> 5-> 1
6 -> 11-> 12-> 6
9-> 10->9

The final comparison of NPaths is:
NPaths:

i = 0: 3 0 2
i = 1: 4 0 2
i = 2: 2 0 2
i = 3: 7 0 2 8 0 2
i = 4: 7 0 2 8 0 2
i = 5: 6 0 2
i = 6: 5 0 2
i = 7: 0 0 3
i = 8: 1 0 4
i = 9: 9 0 3

Here is the time to recall that each ordered triplet of numbers in NPaths [i] defines the corresponding PathsB [i] candidate path from A in the format: the number of the corresponding path from PathsA - the starting position of the segment from this path - the length of the segment. Thus, it is easy to verify that the pathsB [0] = {1-> 1} (i = 0: 3 0 2) corresponds to the only path from A, namely, the segment from PathsA [3], starting from the very beginning and having a length of 2.

Stage IV. As a result, a unique subgraph A1 was found in column A, isomorphic to B. A1 = {0-> 12, 1-> 2, 2-> 5, 4-> 13, 5-> 4, 5-> 5, 5- > 6, 6-> 6, 6-> 9, 9-> 10, 9-> 11, 12-> 0, 13-> 14, 14-> 4}.

5. What next?


And then, in principle, that’s all. Although not quite: the author must admit that the algorithm can still be finalized and finalized. What is there to add?

  1. When additional features of the vertices of columns A and B are introduced (for example, when the compounds are given by the graphs of chemical compounds, a numerical code corresponding to one and only one atom (isotope) can be associated with each vertex), the search process can be accelerated due to the greater accuracy of the comparisons according to Stage II: additional the condition for selecting candidate paths will be the correspondence of vertex labels.
  2. . , «», «» - B.
  3. , , , «» :
    (1) «» , ,
    (2) .
    «» « - », , , .
  4. , - , .


General on the problems:
1. Bonnici, V., Giugno, R., Pulvirenti, A. et al. A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinformatics 14, S13 (2013). doi.org/10.1186/1471-2105-14-S7-S13 .
2. Cordella L, Foggia P, Sansone C, Vento M: A (Sub) Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004, 26 (10): 1367-1372. 10.1109 / TPAMI.2004.75.
3. Ullmann Julian R .: An algorithm for Subgraph Isomorphism. Journal of the Association for Computing Machinery. 1976, 23: 31-42. 10.1145 / 321921.321925.
4. Ullmann Julian R .: Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism. J Exp Algorithmics. 2011, 15 (1.6): 1.1-1.6. 1.64

The author’s preprint written in a more official language about this is the most “NB-Paths-based algorithm”, which also contains information about an attempt to implement it in C ++:
5. Chernoukhov SA 2020. Preprints.RU. Checking the isomorphism of two graphs and searching for isomorphic "inscribed" subgraphs: an approach based on the analysis of maximally extended non-branching paths approach to solve (sub) graph isomorphism problem. DOI: dx.doi.org/10.24108/preprints-3111977

Useful Internet sources on the topic:
6. coderoad.ru/17480142/Is-li- simple-example-for-explanation-algorithm-Ulman
7. issue.life/questions / 8176298 .

All Articles