SIGMOD 2025_Revisiting the Design of In-Memory Dynamic Graph Storage_华为.pdf

迹部景吾

125

27页

0次

2025-06-30

免费下载

Revisiting the Design of In-Memory Dynamic Graph Storage

JIXIAN SU, Shanghai Jiao Tong University, China

CHIYU HAO, Shanghai Jiao Tong University, China

SHIXUAN SUN, Shanghai Jiao Tong University, China

HAO ZHANG, Huawei Cloud, China

SEN GAO, National University of Singapore, Singapore

JIAXIN JIANG, National University of Singapore, Singapore

YAO CHEN, National University of Singapore, Singapore

CHENYI ZHANG, Huawei Cloud, China

BINGSHENG HE, National University of Singapore, Singapore

MINYI GUO, Shanghai Jiao Tong University, China

The eectiveness of in-memory dynamic graph storage (DGS) for supporting concurrent graph read and

write queries is crucial for real-time graph analytics and updates. Various methods have been proposed, for

example, LLAMA, Aspen, LiveGraph, Teseo, and Sortledton. These approaches dier signicantly in their

support for read and write operations, space overhead, and concurrency control. However, there has been no

systematic study to explore the trade-os among these dimensions. In this paper, we evaluate the eectiveness

of individual techniques and identify the performance factors aecting these storage methods by proposing a

common abstraction for DGS design and implementing a generic test framework based on this abstraction.

Our ndings highlight several key insights: 1) Existing DGS methods exhibit substantial space overhead.

For example, Aspen consumes 3.3-10.8x more memory than CSR, while the optimal ne-grained methods

consume 4.1-8.9x more memory than CSR, indicating a signicant memory overhead. 2) Existing methods

often overlook memory access impact of modern architectures, leading to performance degradation compared

to continuous storage methods. 3) Fine-grained concurrency control methods, in particular, suer from severe

eciency and space issues due to maintaining versions and performing checks for each neighbor. These

methods also experience signicant contention on high-degree vertices. Our systematic study reveals these

performance bottlenecks and outlines future directions to improve DGS for real-time graph analytics.

CCS Concepts: • Information systems

→

Graph-based database models; Data structures; Storage man-

agement.

Additional Key Words and Phrases: dynamic graph storage; graph concurrency control; graph neighbor index;

benchmark framework.

ACM Reference Format:

Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng

He, and Minyi Guo. 2025. Revisiting the Design of In-Memory Dynamic Graph Storage. Proc. ACM Manag.

Data 3, 1 (SIGMOD), Article 70 (February 2025), 27 pages. https://doi.org/10.1145/3709720

Authors’ Contact Information: Jixian Su, Shanghai Jiao Tong University, Shanghai, China, sjx13623816973@sjtu.edu.cn;

Chiyu Hao, Shanghai Jiao Tong University, Shanghai, China, hcahoi11@sjtu.edu.cn; Shixuan Sun, Shanghai Jiao Tong

University, Shanghai, China, sunshixuan@sjtu.edu.cn; Hao Zhang, Huawei Cloud, Beijing, China, zhanghao687@huawei.

com; Sen Gao, National University of Singapore, Singapore, sen@u.nus.edu; Jiaxin Jiang, National University of Singapore,

Singapore, jxjiang@nus.edu.sg; Yao Chen, National University of Singapore, Singapore, yaochen@nus.edu.sg; Chenyi

Zhang, Huawei Cloud, Hangzhou, China, zhangchenyi@huawei.com; Bingsheng He, National University of Singapore,

Singapore, hebs@comp.nus.edu.sg; Minyi Guo, Shanghai Jiao Tong University, Shanghai, China, guo-my@cs.sjtu.edu.cn.

This work is licensed under a Creative Commons Attribution International 4.0 License.

ACM 2836-6573/2025/2-ART70

https://doi.org/10.1145/3709720

Proc. ACM Manag. Data, Vol. 3, No. 1 (SIGMOD), Article 70. Publication date: February 2025.

70:2 Jixian Su et al.

STINGER

[HPEC’2012]

LLAMA

[ICDE’2015]

Aspen

[PLDI’2019]

GraphOne

[FAST’2019]

Terrace

[SIGMOD’2021]

Teseo

[VLDB’2021]

LiveGraph

[VLDB’2020]

Sortledton

[VLDB’2022]

Fig. 1. Comparison of DGS methods from previous experiments. An edge from

𝑥

𝑦

indicates that

𝑥

’s

experiments include 𝑦. Shaded methods are transactional approaches.

1 Introduction

Graph storage is the foundation for ecient in-memory graph data processing. As graph data

often require frequent updates [

], in-memory dynamic graph storage (DGS) is essential

for supporting concurrent graph read and write queries, enabling real-time graph analytics and

updates. To address this need, a variety of DGS methods have been proposed [

], as shown in Figure 1. STINGER [

], GraphOne [

], and Terrace [

] do not provide

concurrency control, meaning read and write queries can only execute alternately. In contrast, the

other approaches support concurrent read and write queries with serializability [

], i.e., the results

of executing concurrent queries are equivalent to some serial execution order. LLAMA [

] and

Aspen [

] primarily focus on batch updates with a single-writer model, while recent methods have

shifted toward optimizing individual updates to support a broader range of applications, which

has attracted signicant research interest [

]. Despite their varied update strategies, these

approaches all enable concurrent read and write queries, thus called transactional approaches in

this paper.

These approaches dier signicantly in their support for read and write operations, space

overhead, and concurrency control. First, they employ dierent graph container designs to support

ecient read and write operations. For instance, LLAMA [

] and LiveGraph [

] store a vertex’s

neighbor set

𝑁 (𝑢)

in a dynamic array, enabling fast scan operations with continuous storage and

simple append inserts, but at the cost of expensive search operations. Recent approaches like

Aspen [

], Teseo [

], and Sortledton [

] use a segmented strategy, dividing

𝑁 (𝑢)

into small

blocks (e.g.,

|𝐵| =

256), storing each block as a sorted array, and linking them with a block index

(e.g., a skip list). This design balances insert, scan, and search eciency. Moreover, these approaches

propose additional optimizations, such as adaptive indexing to use dierent data structures for

varying neighbor set sizes to further accelerate graph operations.

Second, although existing methods all use multi-version concurrency control (MVCC) to maxi-

mize parallelism among queries [

], they dier in version granularity. Coarse-grained methods

like LLAMA and Aspen use the typical single-writer-multiple-reader scheme with copy-on-write

(CoW) [

]. For each update, they create a new graph snapshot, while read queries work on the

snapshot at their start. Recent works such as LiveGraph, Teseo, and Sortledton use a ne-grained

strategy, maintaining version information for each neighbor and synchronizing read and write

Proc. ACM Manag. Data, Vol. 3, No. 1 (SIGMOD), Article 70. Publication date: February 2025.

of 27

免费下载

gaussdb paper 华为 sigmod2025

文档被以下合辑收录

数据库顶会 SIGMOD 2025 论文下载（共12篇）

数据库顶会 SIGMOD 2025 论文下载

关注

文档被以下合辑收录

评论