
0 5 10 15 20
PageRank (sec)
M1
M2
M3
M4
1.5s 2.6s
9.0M
157.8M
8.9M
147.6M
8.9M
156.6M
7.4M
98.9M
9.1M
157.8M
9.0M
153.5M
9.0M
157.6M
7.4M
117.0M
9.5M
157.8M
9.4M
156.6M
9.4M
157.7M
7.6M
122.1M
12.5M
157.8M
12.3M
156.5M
12.4M
157.7M
10.8M
141.2M
0 5 10 15 20 25
Connected Component (sec)
2.5s 3.1s
|V|:9.0M
|E|:315.6M
|V|:8.9M
|E|:315.4M
|V|:9.1M
|E|:315.6M
|V|:9.1M
|E|:315.5M
|V|:9.5M
|E|:315.6M
|V|:9.5M
|E|:315.5M
|V|:12.5M
|E|:315.6M
|V|:12.4M
|E|:315.3M
Gather Scatter
Figure 1: Elapsed time of gather and scatter operations on
PowerGraph using NE partitioner.
excessively large partitions; (II) the sizes of these partitions are
highly skewed, indicating the existence of partitions with a signi-
cantly large number of vertices, as well as partitions with very few
vertices. Focusing on only one single balance while minimizing the
replication factor can lead to two troublesome problems as follows.
(1) Terrible Workload Imbalance. In Bulk Synchronous Parallel
(BSP) based systems, a global synchronization checkpoint is set
between two iterations. Thus, the machine with the largest work-
load becomes the performance bottleneck. Figure 1 illustrates the
elapsed time of gather and scatter operations on PowerGraph [
11
]
for the top two iterations of the PageRank and Connected Com-
ponent tasks on four machines (M1, M2, M3, and M4), along with
the involved numbers of vertices and edges. While the number of
involved edges across the four machines is roughly consistent in
the rst iteration, there exists a notable discrepancy in elapsed time.
Discrepancies in the number of vertices can lead to dierences in
cache hit rates, as well as disparities in the numbers of involved
edges in the subsequent iteration. For example, in the second itera-
tion of PageRank, the discrepancies on the involved edges become
signicant in the scatter phase, thereby resulting in variations in
machine elapsed time.
(2) Excessive Memory Consumption. In distributed graph com-
puting frameworks deployed over homogeneous and memory-
constrained clusters, each machine needs to maintain necessary
information, such as neighbor sets and PageRank values for ver-
tices. The excessively large partitions due to vertex imbalance will
signicantly increase memory consumption. As for PowerGraph,
the partition with the largest number of vertices becomes the mem-
ory bottleneck. As depicted in Table 1, for the graph hollywood,
the largest partition delivered by METIS has reached 8 times the
average size. The machine hosting such a large partition is highly
susceptible to encountering memory bottlenecks, which can result
in downstream task failures.
Therefore, it is crucial to minimize the replication factor while
guaranteeing both vertex and edge balance. However, dual-balanced
graph partitioning has not yet received sucient attention and
research despite its importance.
1.2 Challenges and Contributions
Challenge 1: Intractable Complexity. Previous study [
39
] has estab-
lished the graph partitioning problem, minimizing the replication
factor such that the edge balance is bounded by a constant, which
is an NP-hard problem. The inherent complexity coupled with large
graphs with billions of vertices and edges already poses signicant
challenges in terms of time and memory. Dual-balanced graph par-
titioning problem generalizes the single-balanced graph partition
by introducing an additional constraint, making it more dicult to
nd an optimal solution that satises both balances simultaneously.
Therefore, the partitioning algorithm itself is expected to be light-
weight and ecient. These requirements force us to avoid using
overly complex algorithms and redundant data structures.
Challenge 2: Skewed Degrees. Most real-world graphs follow the
power-law distribution, revealing a skewed degree distribution
that the majority of vertices have low degrees, while only a small
subset of vertices are highly connected, known as hub vertices. Hub
vertices play a crucial role in inuencing the density of partitions,
posing a signicant challenge in achieving dual balance.
Existing dual-balanced partitioners [
1
,
24
,
40
] have made eorts
to address these challenges. EBV achieves dual balance by incor-
porating a scoring function that considers both vertex and edge
loads. BPart modies FENNEL [
36
] and requires simultaneous opti-
mization of both vertex and edge balance. MDBGP [
1
] transforms
the multi-dimensional balance partitioning problem into a mathe-
matical optimization problem and solves it using gradient descent,
but it constrains the number of partitions to powers of two. Be-
yond that, the optimization process of MDBGP involves multiple
rounds of
O(𝑛
2
)
intersection point calculations, which is not feasi-
ble for partitioning large graphs. To summarize, these partitioners
(I) suer from poor replication factors, resulting in a substantial
communication cost that becomes a bottleneck again, and (II) do
not support constraints on vertex-and-edge balance parameters or
fail to achieve the desired constraints. There is a notable imbalance
in both the vertex and edge dimensions for many graphs.
Contributions. To realize dual-balanced graph partitioning, in this
paper, we propose the
F
ine-grained
S
plitting and
M
erging (FSM)
framework. The underlying principle behind the FSM framework is
to deal with vertex balance and edge balance incrementally, ensur-
ing them one by one. The FSM framework consists of two phases,
i.e., ne-grained splitting and subgraph merging.
Specically, in the ne-grained splitting phase, FSM performs
primary exploration of the graph by decomposing it into small-
size subgraphs. The aim is to group the highly correlated vertices
and edges together, producing a family of fundamental subgraphs.
Moreover, in this phase, we can just concentrate on one balance at
a time while minimizing replication factors. As a result, the frame-
work becomes highly versatile and powerful, making it possible to
integrate with various state-of-the-art single-balance techniques.
In the subgraph merging phase, we assemble the subgraphs
produced above into larger partitions. Thus, we can consider the
partitioning problem from a broader perspective, enabling us to
neutralize the imbalances eectively and enhance the overall per-
formance of the partitioning process. In particular, we formulate
a subgraph allocation problem and develop two lightweight and
eective merging algorithms, Fast Merging and Precise Merging.
In summary, our contributions can be summarized as follows:
(1)
We formulate the problem of dual-balanced graph partitioning
that minimizes the replication factor while both vertex balance
and edge balance are guaranteed.
2379
文档被以下合辑收录
评论