
is the dual support for Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP). Our usage statistics show
that 70% of tasks involve Hybrid Transaction/Analytical Processing
(HTAP) [
33
], 20% are dedicated to OLTP, and the remaining 10%
to OLAP. Among existing systems, G-Tran [
14
] is notably adept at
OLTP tasks and prioritizes transactional integrity, while Grasper
[
13
] excels in managing OLAP transactions. However, using sepa-
rate systems for distinct OLTP and OLAP tasks can double costs in
terms of development, deployment, and maintenance.
Faced with the unique challenges of processing graph queries
and transactions, we developed Galaxybase
1
, a new native dis-
tributed graph database. Galaxybase features two distinct storage
structures, optimized for read and write performance. The rst is a
Log-Structured Adjacency List, which employs adjacency lists for
sequential data scanning and batch writing to reduce read/write
amplications. The second structure, Edge Page, co-locates edges
for the same vertex and maintains local order within each page
by type and direction while ensuring global order across all edges.
This design supports ecient graph traversal in various directions
and types, as well as quick and accurate single edge queries.
As a distributed graph database deployed in production-grade
environments, Galaxybase is designed to handle a variety of sce-
narios and data scales eectively. It supports transactions using
Two-Phase Commit (2PC) [
26
,
38
] and Raft [
30
] protocols to en-
sure atomicity and durability. The system maintains isolation levels
from read-committed to serializable for OLTP workloads using
Two-Phase Locking (2PL) [
22
]. Galaxybase integrates bidirectional
and interactive transactions, aligning with the unique storage struc-
tures and user demands of graph databases. For OLAP workloads, it
employs Multi-Version Concurrency Control (MVCC) [
35
] visibility
checks with lock-free mechanisms to maintain serializable snapshot
isolation.
Our experiments with OLTP and OLAP workloads demonstrate
that Galaxybase delivers strong performance in both single-machine
and distributed setups. It achieves throughput of up to 50,000
queries per second (q/s) in single-machine mode and 85,000 q/s in
distributed mode, signicantly surpassing baseline graph databases.
In terms of scalability, Galaxybase achieves throughput that is
up to an order of magnitude higher than that of baseline graph
databases. It also shows eciency in edge queries, operating three
times faster than its closest competitor. Furthermore, Galaxybase
handles queries eectively in low-memory environments, enabling
large graph loading and complex query execution without out-of-
memory issues. Additionally, we processed a trillion-scale dataset
that includes 5 billion account vertices and 5 trillion transaction
edges using only 50 machines, each equipped with 12 CPUs and
128GB of memory, achieving multi-hop query results in seconds.
In tracing these endeavors, our paper consolidates the following
contributions:
•
We introduce Galaxybase, a high-performance, native distributed
graph database designed specically for HTAP scenarios. It pro-
vides an ecient, robust, and scalable solution for managing
complex graph data.
1
https://www.createlink.com
locatedIn
follows locatedIn
TIME: 20200315
NAME: UK
NAME: Cindy
AGE: 7
NAME: Alice
AGE: 18
NAME: Bob
AGE: 25
NAME: David
AGE: 20
NAME: China
follows follows
follows
follows
TIME: 20160820
locatedIn
TIME: 20190728
locatedIn
TIME: 20201102
person_1 person_2
person_3 person_4
country_1
country_2
Figure 1: An example of property graph
•
On the storage front, we propose the Log-Structured Adjacency
List, an approach for sequential disk reads and writes that dra-
matically reduces read/write amplications. Complementing this,
our Edge Page design enhances graph traversal eciency, allow-
ing for the eective handling of edges in various directions and
types, while also enabling quick and accurate single edge queries.
•
On the transaction front, we implement distributed transactions
for OLTP workloads using bidirectional and interactive methods.
Additionally, we manage OLAP workloads with lock-free meth-
ods, allowing OLTP and OLAP workloads to run concurrently
without causing blocks.
•
In the distributed mode, Galaxybase achieves a throughput of
up to 85,000 queries per second in OLTP workloads, and its
performance in OLAP workloads exceeds competitors by an
order of magnitude. This high eciency is sustained even under
restricted memory resources, enabling the execution of complex
queries in environments with limited capacity.
2 BACKGROUND AND DESIGN PRINCIPLE
Reecting on the challenges and limitations of current graph databases
outlined in Section 1, this section delves into the motivation and
key factors in crafting Galaxybase. Our primary objective is to
build a unied system that demonstrates exceptional performance,
availability, scalability, and robust transaction capabilities.
Galaxybase utilizes the property graph model [
6
], where vertices
and edges can possess a variety of properties. Based on this model,
we develop a Ming Dynasty literature knowledge graph for univer-
sities to enhance literary research and teaching, build a power grid
knowledge graph for the State Grid to ensure accurate and stable
power dispatch strategies, and implement a nancial fraud detection
graph for banks to enhance security and more eectively identify
fraudulent activities. As illustrated in Figure 1, in a social network
using the property graph model, each vertex/edge is assigned a
type (e.g.,
person
,
country
,
follows
,
locatedIn
), alongside a set
of properties (e.g., NAME:Alice and TIME:20201102).
Graph databases organize data through edges, oering the signif-
icant advantage of native and ecient support for graph traversal
queries. These queries navigate the graph from a specied vertex
to a predetermined depth or target vertex. For example, as depicted
in Figure 1, a graph traversal query starting from vertex
person_1
with a depth of 1 and a relational constraint of
follows
would
identify all followers of
person_1
. Relational databases depend
3894
文档被以下合辑收录
评论