Noah: Neural-optimized A* Search Algorithm for
Graph Edit Distance Computation
Lei Yang
†
, Lei Zou
†,∗
†
Peking University, China;
∗
National Engineering Laboratory for Big Data Analysis Technology and Application (PKU), China;
{yang_lei,zoulei}@pku.edu.cn
Abstract—Graph Edit Distance (GED) is a classical graph sim-
ilarity metric that can be tailored to a wide range of applications.
However, the exact GED computation is NP-complete, which
means it is only feasible for small graphs only. And therefore,
approximate GED computation methods are used in most real-
world applications. However, traditional practices and end-to-end
learning-based methods have their shortcomings when applied for
approximate GED computation. The former relies on experience
and usually performs not well. The latter is only capable of
computing similarity scores between graphs without an actual
edit path, which is crucial in specific problems (e.g., Graph
Alignment, Semantic Role Labeling). This paper proposes a novel
approach Noah, which combines A* search algorithm and graph
neural networks to compute approximate GED in a more effective
and intelligent way. The combination is mainly reflected in two
aspects. First, we learn the estimated cost function h(·) by Graph
Path Networks. Pre-training GEDs and corresponding edit paths
are also incorporated for training the model, therefore helping
optimize the search direction of A* search algorithm. Second, we
learn an elastic beam size that can help reduce search size and
satisfy various user settings. Experimental results demonstrate
the practical effectiveness of our approach on several tasks and
suggest that our approach significantly outperforms the state-of-
the-art methods.
Index Terms—graph edit distance, A* search algorithm, neural
networks
I. INTRODUCTION
Recently, graphs are ubiquitous and have attracted increas-
ing research interest because many data in a wide range of
applications can be represented by graphs, such as chemical
compounds [1], social networks [2], road networks [3] and
semantic web [4]. One of the fundamental problems in such
graph-represented applications is graph similarity search (i.e.,
given a query graph q, finding a set of similar graphs g in a
graph database D, such that q is approximately matched with g
under some similarity metric). Two classical graph similarity
metrics are Graph Edit Distance (GED) [5] and Maximum
Common Subgraph (MCS) [6]. Note that the two metrics are
inter-related [7], and they are both NP-complete [8]. In this
paper, we focus on GED computation.
A. Existing solutions and motivations
The most widely used method for exact GED computation
is based on the A* search algorithm [9], which casts it
as a path-finding problem and focuses on how to expand
the existing search paths. In detail, the algorithm explores
the space of all possible mappings between two graphs by
means of an ordered tree. Such a search tree is constructed
dynamically by iteratively creating successor nodes linked by
edges to the currently considered node in the search tree.
The further expansion of the search path is determined by
a cost function f(·), which can be divided into two parts
(i.e., f(·) = g(·) + h(·), where g(·) is the observable cost
and h(·) is the estimated cost). Specifically, in each iteration,
the search path of minimum cost is selected from the heap
of all currently possible paths. In order to guarantee the final
result of A* search algorithm to be optimal, the estimated cost
h(·) should be lower than, or equal to, the real cost. Based on
this, early studies focus on the heuristic functions which can
better estimate h(·) [10]–[13], which is a major task in A*
search algorithm.
However, since the search space grows exponentially along
with more nodes, the exact GED cannot be reliably computed
within reasonable time between graphs with more than 16
nodes [14]. To avoid colossal computation costs and satisfy
high real-time requirements in many applications, two main
categories are proposed in early works. We briefly introduce
them and their defects. First, modifications are based on A*
search algorithm, such as A*-Beamsearch [15]. Specifically, it
limits the size of the heap of A* search algorithm to obtain
approximate GEDs in a short time. However, lower bounds
based on heuristic functions are not close enough to the
ground-truth value of h(·), and parameters of modifications
are almost fixed and based on experience. Specifically, the
above two problems would bring extra search costs (e.g., the
beam size is set high) or miss the optimal results (e.g., the
beam size is set low).
Second, end-to-end learning-based methods are applied for
graph similarity search [16], [17]. Specifically, they design
a network-based function that maps the graph pairs into
similarity scores, which turns the GED computation into
a learning problem. However, this kind of methods might
achieve incorrect approximate GED (i.e., the obtained GED
is smaller than the exact GED), and therefore could not
find an actual edit path from the source graph to the target
graph, which is crucial in specific problems such as Graph
Alignment [18], Semantic Role Labeling [19], etc. Meanwhile,
such methods focus on the evaluation of graph query task (i.e.,
for each graph in the testing set, we treat it as a query graph,
and let the model compute the similarity between the query
评论