
log are recursive, namely containing Kleene stars. For these re-
cursive queries,
Δ
trees built in S-PATH have no depth constraint
and can be very large. For example, according to our experiments
with StackOverow, a real-world social network dataset, when an-
swering RPQ
𝑎
∗
𝑏
∗
in a sliding window with 180
𝑘
edges and 59
𝑘
vertices, there are 3
.
6
𝑘 Δ
trees with size above 21
𝑘
in S-PATH. The
total number of tree nodes is 96
𝑀
, and the memory cost of S-PATH
is more than 600 times larger than the original streaming graph.
In real-world applications, we usually need to monitor multiple
persistent queries at the same time, and the total memory cost will
be multiple times higher. Such high memory consumption limits
the scalability of the algorithm, especially when the application
runs in embedded devices like hubs and routers.
In this paper, we propose landmark-based streaming RPQ (LM-
SRPQ) to decrease the size of the
Δ
tree forests while keeping a high
update speed. We notice that there are common subtrees with the
same root in dierent
Δ
trees, resulting in redundant storage. The
subtree with root
𝑣
𝑖
in a
Δ
tree
𝑇
stores paths from
𝑣
𝑖
to a subset of
its successors, which are fragments of paths from the root of
𝑇
to
these successors. Common subtrees in dierent
Δ
trees are induced
by common path fragments. Therefore, we can decrease memory
usage by merging these common path fragments, which leads to
the merging of common subtrees. We propose to select a group of
nodes as landmarks. Each path in the product graph can be split into
a sequence of local paths, which are path fragments containing
no landmarks. We build a
Δ
tree for each landmark
𝑣
𝑖
to maintain
local paths starting from
𝑣
𝑖
. Such a
Δ
tree is called a landmark tree,
or LM tree for short. We also build
Δ
trees for the original tree roots
in S-PATH, and they maintain local paths starting from these tree
roots. Then we compute paths in the product graph and answer
persistent RPQ by continuously computing concatenations of local
paths. Each local path from a landmark
𝑣
𝑖
to its successor
𝑣
𝑗
may
participate in the computation of multiple product graph paths,
corresponding to common fragments in these paths. In prior art,
this local path will be stored multiple times in the common subtrees
of
𝑣
𝑖
. But in our method, we only need to store it once in the LM
tree of 𝑣
𝑖
. Therefore, redundant storage is eliminated.
However, there are two major challenges in LM-SRPQ:
First, computing local path concatenations raises a performance
issue. In LM-SRPQ, we build a dependency graph
𝐺
𝑑
to aid us in
the local path concatenation. Each node in
𝐺
𝑑
represents a
Δ
tree
and each edge represents the local path connecting the roots of
two
Δ
trees. Concatenated local path sequences can be represented
as paths in
𝐺
𝑑
. When a new tuple arrives in the streaming graph,
we rst update the Δ trees, producing new local paths and adding
new edges in
𝐺
𝑑
. Then we need to traverse in
𝐺
𝑑
to nd new
paths containing the new edge, which represents new local path
sequences. The cost of such traversal is high and may slow down
the entire algorithm. We need additional acceleration techniques.
Second, how to select a proper landmark set raises another chal-
lenge. Dierent landmark sets result in dierent
Δ
tree forest sizes.
Moreover, as discussed above, the dependency graph traversal is the
bottleneck of the algorithm, and the landmark number inuences
the number of LM trees, as well as the dependency graph size. We
need to bound the landmark number to control the dependency
graph traversal cost. Therefore, we hope to select a landmark set
with a bounded size and try to minimize the
Δ
tree forest size. The
Table 1: Notation Table
Notation Meaning
𝑊 Sliding window
𝛽 Sliding interval
𝐺
𝜏
Snapshot graph at time 𝜏
𝑒.𝑡𝑠 / 𝑝.𝑡𝑠 Timestamp of edge 𝑒 / path 𝑝
𝜙 (𝑒 ) / 𝜙 (𝑝 ) Label of edge 𝑒 / path 𝑝
𝐴
𝑅
DFA built for regular expression 𝑅
𝐴
𝑅
.𝐹 Final state set in 𝐴
𝑅
𝑠
0
/ 𝑠
𝑓
Initial state / nal state in DFA
𝑃 (𝐺, 𝐴) Product graph of graph 𝐺 and DFA 𝐴
⟨𝑣
𝑖
, 𝑠
𝑖
⟩ Product graph node with vertex 𝑣
𝑖
and state 𝑠
𝑖
.
𝑇
𝑣
𝑖
,𝑠
𝑖
Δ tree with root node ⟨𝑣
𝑖
, 𝑠
𝑖
⟩.
𝑇
𝑣
𝑖
,𝑠
𝑖
.⟨𝑣
𝑗
, 𝑠
𝑗
⟩.𝑡𝑠 Timestamp of node ⟨𝑣
𝑗
, 𝑠
𝑗
⟩ in tree 𝑇
𝑣
𝑖
,𝑠
𝑖
search space of such landmark selection is exponential, making it
computationally impossible to nd an optimal solution. We have
to resort to a greedy algorithm. Besides, we need to continuously
update the landmark set to keep up with the streaming graph.
In LM-SRPQ, we use a greedy algorithm in landmark selection,
which tries to minimize the size of the
Δ
tree forest while bounding
the number of
Δ
trees. Besides, we also continuously maintain an
additional index called TI-map in each LM tree, which records the
timestamps of paths starting from the tree root. Based on these
timestamps we propose a set of rules for pruning in dependency
graph traversal. We carry out extensive experiments in two real-
world datasets and one synthetic dataset to evaluate the perfor-
mance of our algorithm. The result shows that LM-SRPQ reduces
the memory usage of prior art S-PATH by at most 30 times. More-
over, as common subtree merging reduces the
Δ
tree update cost
and ecient pruning reduces the dependency graph traversal cost,
LM-SRPQ is at most 4.5 times faster than S-PATH.
In summary, we made the following contributions in this paper:
(1)
We propose a novel algorithm for persistent RPQ in stream-
ing graphs, named LM-SRPQ. It decreases the memory cost
as well as the update cost of the prior art by eliminating
redundant storage and computation.
(2)
To balance the time and memory cost of our algorithm, we
propose a greedy landmark selection algorithm to balance
the number of
Δ
trees and the size of the
Δ
tree forest, as
well as an additional index that helps to prune the recursive
traversal in the dependency graph.
(3)
We carry out extensive experiments to evaluate our algo-
rithm, which conrms its superiority against prior art.
2 PRELIMINARIES
We formally dene our problem in Section 2.1 and list frequently-
used notations in Table 1. To better understand the motivation of
our method, we briey introduce S-PATH [
26
] in Section 2.2, which
is the prior art for RPQ over streaming graphs in the literature.
2.1 Problem Denition
Definition 2.1. Graph. A directed, edge-labeled graph is dened
as
𝐺 = (𝑉, 𝐸, Σ, 𝜙)
, where
𝑉
is a vertex set,
𝐸 ⊂ 𝑉 × 𝑉
is an edge set,
Σ is a set of labels, and 𝜙 : 𝐸 → Σ is an edge labeling function.
1048
文档被以下合辑收录
评论