Noah Neural-optimized A Search Algorithm for-2021年-杨磊-ICDE.pdf

墨天轮福利君

105

12页

0次

2023-09-15

免费下载

Noah: Neural-optimized A* Search Algorithm for

Graph Edit Distance Computation

Lei Yang

†

, Lei Zou

†,∗

†

Peking University, China;

∗

National Engineering Laboratory for Big Data Analysis Technology and Application (PKU), China;

{yang_lei,zoulei}@pku.edu.cn

Abstract—Graph Edit Distance (GED) is a classical graph sim-

ilarity metric that can be tailored to a wide range of applications.

However, the exact GED computation is NP-complete, which

means it is only feasible for small graphs only. And therefore,

approximate GED computation methods are used in most real-

world applications. However, traditional practices and end-to-end

learning-based methods have their shortcomings when applied for

approximate GED computation. The former relies on experience

and usually performs not well. The latter is only capable of

computing similarity scores between graphs without an actual

edit path, which is crucial in speciﬁc problems (e.g., Graph

Alignment, Semantic Role Labeling). This paper proposes a novel

approach Noah, which combines A* search algorithm and graph

neural networks to compute approximate GED in a more effective

and intelligent way. The combination is mainly reﬂected in two

aspects. First, we learn the estimated cost function h(·) by Graph

Path Networks. Pre-training GEDs and corresponding edit paths

are also incorporated for training the model, therefore helping

optimize the search direction of A* search algorithm. Second, we

learn an elastic beam size that can help reduce search size and

satisfy various user settings. Experimental results demonstrate

the practical effectiveness of our approach on several tasks and

suggest that our approach signiﬁcantly outperforms the state-of-

the-art methods.

Index Terms—graph edit distance, A* search algorithm, neural

networks

I. INTRODUCTION

Recently, graphs are ubiquitous and have attracted increas-

ing research interest because many data in a wide range of

applications can be represented by graphs, such as chemical

compounds [1], social networks [2], road networks [3] and

semantic web [4]. One of the fundamental problems in such

graph-represented applications is graph similarity search (i.e.,

given a query graph q, ﬁnding a set of similar graphs g in a

graph database D, such that q is approximately matched with g

under some similarity metric). Two classical graph similarity

metrics are Graph Edit Distance (GED) [5] and Maximum

Common Subgraph (MCS) [6]. Note that the two metrics are

inter-related [7], and they are both NP-complete [8]. In this

paper, we focus on GED computation.

A. Existing solutions and motivations

The most widely used method for exact GED computation

is based on the A* search algorithm [9], which casts it

as a path-ﬁnding problem and focuses on how to expand

the existing search paths. In detail, the algorithm explores

the space of all possible mappings between two graphs by

means of an ordered tree. Such a search tree is constructed

dynamically by iteratively creating successor nodes linked by

edges to the currently considered node in the search tree.

The further expansion of the search path is determined by

a cost function f(·), which can be divided into two parts

(i.e., f(·) = g(·) + h(·), where g(·) is the observable cost

and h(·) is the estimated cost). Speciﬁcally, in each iteration,

the search path of minimum cost is selected from the heap

of all currently possible paths. In order to guarantee the ﬁnal

result of A* search algorithm to be optimal, the estimated cost

h(·) should be lower than, or equal to, the real cost. Based on

this, early studies focus on the heuristic functions which can

better estimate h(·) [10]–[13], which is a major task in A*

search algorithm.

However, since the search space grows exponentially along

with more nodes, the exact GED cannot be reliably computed

within reasonable time between graphs with more than 16

nodes [14]. To avoid colossal computation costs and satisfy

high real-time requirements in many applications, two main

categories are proposed in early works. We brieﬂy introduce

them and their defects. First, modiﬁcations are based on A*

search algorithm, such as A*-Beamsearch [15]. Speciﬁcally, it

limits the size of the heap of A* search algorithm to obtain

approximate GEDs in a short time. However, lower bounds

based on heuristic functions are not close enough to the

ground-truth value of h(·), and parameters of modiﬁcations

are almost ﬁxed and based on experience. Speciﬁcally, the

above two problems would bring extra search costs (e.g., the

beam size is set high) or miss the optimal results (e.g., the

beam size is set low).

Second, end-to-end learning-based methods are applied for

graph similarity search [16], [17]. Speciﬁcally, they design

a network-based function that maps the graph pairs into

similarity scores, which turns the GED computation into

a learning problem. However, this kind of methods might

achieve incorrect approximate GED (i.e., the obtained GED

is smaller than the exact GED), and therefore could not

ﬁnd an actual edit path from the source graph to the target

graph, which is crucial in speciﬁc problems such as Graph

Alignment [18], Semantic Role Labeling [19], etc. Meanwhile,

such methods focus on the evaluation of graph query task (i.e.,

for each graph in the testing set, we treat it as a query graph,

and let the model compute the similarity between the query

graph and every graph in the database), which might not ﬁt

for GED computation very well. Because in most cases, the

two graphs in the graph pair are not seen before, rather than

one of them is in the database.

B. Our approach

We propose a novel approach in this paper to compute

approximate GED and address additional tasks (e.g., graph

similarity search, graph classiﬁcation) without the inﬂuence

of the above disadvantages. Our approach, called Noah (i.e.,

short for Neural-optimized A* search algorithm), optimized

A* search algorithm through graph neural networks in two

aspects. First, the estimated cost function h(·) learned by

graph neural networks helps A* search algorithm optimize

the search direction. Second, an elastic beam size can help

optimize search space and satisfy various user settings. Table I

summarizes the characteristics of existing solutions and our

approach.

TABLE I

SUMMARY OF EXISTING SOLUTIONS AND NOAH.

Method Type Node size Acc Edit path

A* exact ≤16 - able

A*-Beam approximate tens medium able

End-to-end approximate ≈100 low disable

Noah approximate hundreds high able

We further propose an extension to GNNs, which we

call Graph Path Networks (GPN) for the above neural op-

timizations. Speciﬁcally, we incorporate pre-training GEDs

and attention-based edit paths for training the model. Such

pre-training information not only enriches training data but

also signiﬁcantly improves the performance of A* search

algorithm. On the other hand, we introduce graph alignment

information (i.e., node substitution) as the cross-graph infor-

mation into the training process, which effectively reduces the

model size compared to early works. In addition to the above

design, we also encode user settings (e.g., permissible error

and running time for GED computation) to learn an elastic

beam size (i.e., a beam size for each pair of input graphs)

for further optimization in A* search algorithm. Through the

above main design, Noah can optimize A* search algorithm

both in search direction and search space to compute GED

more effectively and intelligently. Note that Noah can abso-

lutely ﬁnd an edit path from the source graph to the target

graph as A* search algorithm does.

Our contributions are summarized as follows:

• To the best of our knowledge, we are the ﬁrst to use

neural networks for reinforcing A* search algorithm in

GED computation. Our approach, called Noah, optimizes

the search direction and search space by predicting the

estimated cost function and beam size, respectively.

• We further propose Graph Path Networks for the op-

timizations. Speciﬁcally, it incorporates attention-based

pre-training edit path information for training the model,

introduces graph alignment information as cross-graph

information into the training process, and encodes user

settings for learning an elastic beam size.

• We have evaluated our proposal by comparing it with

the state-of-the-art baselines on three real-world datasets

and a synthetic dataset. Experimental results suggest that

our approach signiﬁcantly outperforms other methods in

GED computation metrics and graph similarity metrics

across a range of tasks.

The remainder of this paper is organized as follows. Section

2 introduces the preliminaries of our work and reviews A*

search algorithm for GED computation. Section 3 poses the

workﬂow of Noah, and Section 4 describes the detailed design

of GPN. We evaluate our proposal in Section 5 and discuss

related works in Section 6. At last, we conclude in Section 7.

II. PRELIMINARIES AND BACKGROUND

Graph edit distance (GED) deﬁnes the dissimilarity of two

graphs by the minimum amount of primitive operations needed

to transform one graph into the other one. In this section,

we ﬁrst review the terminologies used in this paper, and then

review A* search algorithm.

A. Graph Edit Distance

Deﬁnition 1. Graph. A graph is denoted by a 6-tuple g =

(V, E, L

, L

, Σ

), where V denotes a set of vertices,

E ⊆ V × V a set of (un)directed edges, Σ

and Σ

are the

label sets of V and E, respectively, and L

and L

are label

functions that assign labels to vertices and edges, respectively.

There are six primitive edit operations on graphs: in-

sert/delete a vertex with a label, substitute a/an vertex/edge

label, and insert/delete an edge between two vertices. And the

edit path is deﬁned as follows.

Deﬁnition 2. Edit path. Given two graphs g

and g

, there

exists a sequence of primitive edit operations to transform g

to g

, such as, g

= g

→ g

→ · · · → g

= g

We may have different operation sequences to transform g

to g

, and each operation sequence can correspond to a node

substitution, which is useful information in Graph Alignment.

Deﬁnition 3. Node substitution. Given two graphs g

and

, and their vertices V

= {u

, · · · , u

|(V

} and V

, · · · , v

|(V

}. A node substitution is denoted by p = {u

→

, · · · , u

→ ε, · · · , ε → v

, · · · }, where u

→ v

denotes

the substitution of a vertex u

by a vertex v

, u

→ ε denotes

the deletion of u

, and ε → v

denotes the insertion of v

Note that a node substitution only consists of edit operations

on vertices, while edit operations on edges can be implied by

edit operations on their adjacent vertices.

Deﬁnition 4. Graph Edit Distance (GED). Given two graphs

and g

, their GED is deﬁned as the minimum number

of primitive operations to transform g

to g

, denoted by

GED(g

, g

). Note that there might have several edit paths

to compute the GED.

We pose an example of an edit path and its corresponding

node substitution in Figure 1. It is also the minimum cost edit

of 12

免费下载

gstore paper

关注

评论