暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Noah Neural-optimized A Search Algorithm for-2021年-杨磊-ICDE.pdf
105
12页
0次
2023-09-15
免费下载
Noah: Neural-optimized A* Search Algorithm for
Graph Edit Distance Computation
Lei Yang
, Lei Zou
,
Peking University, China;
National Engineering Laboratory for Big Data Analysis Technology and Application (PKU), China;
{yang_lei,zoulei}@pku.edu.cn
Abstract—Graph Edit Distance (GED) is a classical graph sim-
ilarity metric that can be tailored to a wide range of applications.
However, the exact GED computation is NP-complete, which
means it is only feasible for small graphs only. And therefore,
approximate GED computation methods are used in most real-
world applications. However, traditional practices and end-to-end
learning-based methods have their shortcomings when applied for
approximate GED computation. The former relies on experience
and usually performs not well. The latter is only capable of
computing similarity scores between graphs without an actual
edit path, which is crucial in specific problems (e.g., Graph
Alignment, Semantic Role Labeling). This paper proposes a novel
approach Noah, which combines A* search algorithm and graph
neural networks to compute approximate GED in a more effective
and intelligent way. The combination is mainly reflected in two
aspects. First, we learn the estimated cost function h(·) by Graph
Path Networks. Pre-training GEDs and corresponding edit paths
are also incorporated for training the model, therefore helping
optimize the search direction of A* search algorithm. Second, we
learn an elastic beam size that can help reduce search size and
satisfy various user settings. Experimental results demonstrate
the practical effectiveness of our approach on several tasks and
suggest that our approach significantly outperforms the state-of-
the-art methods.
Index Terms—graph edit distance, A* search algorithm, neural
networks
I. INTRODUCTION
Recently, graphs are ubiquitous and have attracted increas-
ing research interest because many data in a wide range of
applications can be represented by graphs, such as chemical
compounds [1], social networks [2], road networks [3] and
semantic web [4]. One of the fundamental problems in such
graph-represented applications is graph similarity search (i.e.,
given a query graph q, finding a set of similar graphs g in a
graph database D, such that q is approximately matched with g
under some similarity metric). Two classical graph similarity
metrics are Graph Edit Distance (GED) [5] and Maximum
Common Subgraph (MCS) [6]. Note that the two metrics are
inter-related [7], and they are both NP-complete [8]. In this
paper, we focus on GED computation.
A. Existing solutions and motivations
The most widely used method for exact GED computation
is based on the A* search algorithm [9], which casts it
as a path-finding problem and focuses on how to expand
the existing search paths. In detail, the algorithm explores
the space of all possible mappings between two graphs by
means of an ordered tree. Such a search tree is constructed
dynamically by iteratively creating successor nodes linked by
edges to the currently considered node in the search tree.
The further expansion of the search path is determined by
a cost function f(·), which can be divided into two parts
(i.e., f(·) = g(·) + h(·), where g(·) is the observable cost
and h(·) is the estimated cost). Specifically, in each iteration,
the search path of minimum cost is selected from the heap
of all currently possible paths. In order to guarantee the final
result of A* search algorithm to be optimal, the estimated cost
h(·) should be lower than, or equal to, the real cost. Based on
this, early studies focus on the heuristic functions which can
better estimate h(·) [10]–[13], which is a major task in A*
search algorithm.
However, since the search space grows exponentially along
with more nodes, the exact GED cannot be reliably computed
within reasonable time between graphs with more than 16
nodes [14]. To avoid colossal computation costs and satisfy
high real-time requirements in many applications, two main
categories are proposed in early works. We briefly introduce
them and their defects. First, modifications are based on A*
search algorithm, such as A*-Beamsearch [15]. Specifically, it
limits the size of the heap of A* search algorithm to obtain
approximate GEDs in a short time. However, lower bounds
based on heuristic functions are not close enough to the
ground-truth value of h(·), and parameters of modifications
are almost fixed and based on experience. Specifically, the
above two problems would bring extra search costs (e.g., the
beam size is set high) or miss the optimal results (e.g., the
beam size is set low).
Second, end-to-end learning-based methods are applied for
graph similarity search [16], [17]. Specifically, they design
a network-based function that maps the graph pairs into
similarity scores, which turns the GED computation into
a learning problem. However, this kind of methods might
achieve incorrect approximate GED (i.e., the obtained GED
is smaller than the exact GED), and therefore could not
find an actual edit path from the source graph to the target
graph, which is crucial in specific problems such as Graph
Alignment [18], Semantic Role Labeling [19], etc. Meanwhile,
such methods focus on the evaluation of graph query task (i.e.,
for each graph in the testing set, we treat it as a query graph,
and let the model compute the similarity between the query
graph and every graph in the database), which might not fit
for GED computation very well. Because in most cases, the
two graphs in the graph pair are not seen before, rather than
one of them is in the database.
B. Our approach
We propose a novel approach in this paper to compute
approximate GED and address additional tasks (e.g., graph
similarity search, graph classification) without the influence
of the above disadvantages. Our approach, called Noah (i.e.,
short for Neural-optimized A* search algorithm), optimized
A* search algorithm through graph neural networks in two
aspects. First, the estimated cost function h(·) learned by
graph neural networks helps A* search algorithm optimize
the search direction. Second, an elastic beam size can help
optimize search space and satisfy various user settings. Table I
summarizes the characteristics of existing solutions and our
approach.
TABLE I
SUMMARY OF EXISTING SOLUTIONS AND NOAH.
Method Type Node size Acc Edit path
A* exact 16 - able
A*-Beam approximate tens medium able
End-to-end approximate 100 low disable
Noah approximate hundreds high able
We further propose an extension to GNNs, which we
call Graph Path Networks (GPN) for the above neural op-
timizations. Specifically, we incorporate pre-training GEDs
and attention-based edit paths for training the model. Such
pre-training information not only enriches training data but
also significantly improves the performance of A* search
algorithm. On the other hand, we introduce graph alignment
information (i.e., node substitution) as the cross-graph infor-
mation into the training process, which effectively reduces the
model size compared to early works. In addition to the above
design, we also encode user settings (e.g., permissible error
and running time for GED computation) to learn an elastic
beam size (i.e., a beam size for each pair of input graphs)
for further optimization in A* search algorithm. Through the
above main design, Noah can optimize A* search algorithm
both in search direction and search space to compute GED
more effectively and intelligently. Note that Noah can abso-
lutely find an edit path from the source graph to the target
graph as A* search algorithm does.
Our contributions are summarized as follows:
To the best of our knowledge, we are the first to use
neural networks for reinforcing A* search algorithm in
GED computation. Our approach, called Noah, optimizes
the search direction and search space by predicting the
estimated cost function and beam size, respectively.
We further propose Graph Path Networks for the op-
timizations. Specifically, it incorporates attention-based
pre-training edit path information for training the model,
introduces graph alignment information as cross-graph
information into the training process, and encodes user
settings for learning an elastic beam size.
We have evaluated our proposal by comparing it with
the state-of-the-art baselines on three real-world datasets
and a synthetic dataset. Experimental results suggest that
our approach significantly outperforms other methods in
GED computation metrics and graph similarity metrics
across a range of tasks.
The remainder of this paper is organized as follows. Section
2 introduces the preliminaries of our work and reviews A*
search algorithm for GED computation. Section 3 poses the
workflow of Noah, and Section 4 describes the detailed design
of GPN. We evaluate our proposal in Section 5 and discuss
related works in Section 6. At last, we conclude in Section 7.
II. PRELIMINARIES AND BACKGROUND
Graph edit distance (GED) defines the dissimilarity of two
graphs by the minimum amount of primitive operations needed
to transform one graph into the other one. In this section,
we first review the terminologies used in this paper, and then
review A* search algorithm.
A. Graph Edit Distance
Definition 1. Graph. A graph is denoted by a 6-tuple g =
(V, E, L
V
, L
E
, Σ
V
, Σ
E
), where V denotes a set of vertices,
E V × V a set of (un)directed edges, Σ
V
and Σ
E
are the
label sets of V and E, respectively, and L
V
and L
E
are label
functions that assign labels to vertices and edges, respectively.
There are six primitive edit operations on graphs: in-
sert/delete a vertex with a label, substitute a/an vertex/edge
label, and insert/delete an edge between two vertices. And the
edit path is defined as follows.
Definition 2. Edit path. Given two graphs g
1
and g
2
, there
exists a sequence of primitive edit operations to transform g
1
to g
2
, such as, g
1
= g
0
1
g
1
1
· · · g
k
1
= g
2
.
We may have different operation sequences to transform g
1
to g
2
, and each operation sequence can correspond to a node
substitution, which is useful information in Graph Alignment.
Definition 3. Node substitution. Given two graphs g
1
and
g
2
, and their vertices V
1
= {u
1
, · · · , u
|(V
1
)|
} and V
2
=
{v
1
, · · · , v
|(V
2
)|
}. A node substitution is denoted by p = {u
i
v
j
, · · · , u
i
0
ε, · · · , ε v
j
0
, · · · }, where u
i
v
j
denotes
the substitution of a vertex u
i
by a vertex v
j
, u
i
0
ε denotes
the deletion of u
i
0
, and ε v
j
0
denotes the insertion of v
j
0
.
Note that a node substitution only consists of edit operations
on vertices, while edit operations on edges can be implied by
edit operations on their adjacent vertices.
Definition 4. Graph Edit Distance (GED). Given two graphs
g
1
and g
2
, their GED is defined as the minimum number
of primitive operations to transform g
1
to g
2
, denoted by
GED(g
1
, g
2
). Note that there might have several edit paths
to compute the GED.
We pose an example of an edit path and its corresponding
node substitution in Figure 1. It is also the minimum cost edit
of 12
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜