
ASPLOS ’23, March 25–29, 2023, Vancouver, BC, Canada C. Ruan, Y. Zhang, C. Bi, X. Ma, H. Chen, F. Li, X. Yang, C. Li, A. Aboulnaga, and Y. Xu
such services allow the reuse of database software infrastructures
as well as the consolidation/sharing of hardware resources.
SQL/TXN engine
Compute node (CN)
Memory node (MN)
Log
Storage node (SN)
Write
Append
DRAM buffer pool
Write
Update
Write
Read
Read
Cache
Log
Data pages
Figure 1: Sample disaggregated cloud-native DB architecture
spanning three layers: CPU, memor y, and storage
Though such new architecture enables the independent scaling
of resources, there remain major constraints impeding its adoption.
First, DRAM disaggregation faces its limited per-machine density,
high (and uctuating) price [
47
], and volatility, making it a more
costly and less reliable layer for hosting a cloud-native database’s
working set. Second, writes remain slow, especially with transac-
tions, as changes need to be persisted in time to the storage layer.
In this work, we argue that Persistent Memory (PM), also known
as non-volatile memory (NVM), driven by diverse technologies
such as 3D XPoint [
20
], BiCS Flash [
21
], and PCM [
55
], emerges
as an appealing layer for resource disaggregation. Compared with
DRAM, PM oers higher provisioning density (e.g., one DIMM slot
can hold 512
GB
Optane PM, but only 128
GB
DDR4 DRAM). It
simultaneously oers persistence, enabling fast writes and recov-
ery. In addition, PM preserves ultra-low-latency remote access via
RDMA, an advantage over fast SSDs. Such multi-fold capability
makes PM an ideal candidate for disaggregated databases, as we can
simultaneously cache hot pages and persist log data on a shared and
distributed PM layer. This brings on-demand, cost-eective memory
buer expansion, fast data persistence, and enhanced availability.
However, existing PM disaggregation work has not fully consid-
ered database redesign to utilize the versatile PM units, focusing
instead on supporting native data structures [
45
] or simple applica-
tions like KV stores [
63
]. Applying these solutions to cloud-native
databases could easily lead to new bottlenecks on the shared, re-
mote PM nodes (PMNs). The rst is the tension between the limited
PM write bandwidth [
26
,
32
,
75
] and the heavy bandwidth con-
sumption of existing solutions. The latter is largely due to writing
redundancy/amplication caused by logging and dirty data ushing.
Ooading log management to the PM side would reduce the PM
bandwidth pressure (by not sending dirty pages but reproducing
them by PM-side log application). On the other hand, this comes at
the price of heavy CPU involvement on the PM nodes, required to
handle ooaded data (especially their updates) and coordinate con-
current data accesses. Finally, complex management logic on the PM
side would complicate the critical-path reads and writes. Both these
PM-side bottlenecks (write bandwidth and CPU), unfortunately,
conict directly with the main selling point of PM disaggregation
for the cloud: having a shared PM node pool supporting many
compute nodes running database instances.
To address these challenges, we propose PilotDB, a novel PM-
disaggregated cloud-native database architecture featuring the fol-
lowing innovations.
First, PilotDB embodies CDLog (Compute-node-Driven Logging),
a central logging mechanism that eciently ooads bulk data to
the PM layer as a large, fast page buer, yet with light computation
there to support speedy logging and update handling. While retain-
ing page-based data organization of relational databases, it discards
the conventional page-based WAL organization and instead adopts
ne-grained, physical logging, where data entries directly embed
changes at a mini-page granularity as well as concerned remote PM
memory addresses. This allows compute nodes only to ush CDLog
entries to remote PM via one-sided RDMA and enables light-weight,
DMA-based log application on the PM nodes, simultaneously re-
ducing PM nodes’ CPU and write bandwidth consumption.
Second, PilotDB is designed to be coordination-free, even in the
presence of concurrent reads/writes to the PM log and buers,
further shaving CPU consumption on the PM nodes. This is enabled
by (1) lock-free data structures designed to manage the PM log area,
with light-weight conict check mechanisms and (2) a novel log-
pull mechanism that allows compute nodes’ query processing to
perform remote reads optimistically, with logs “read back” from
the PM side in the rare occasion of the retrieved PM-cached page
found stale, again enabled by our CDLog organization.
We implemented a PilotDB prototype atop MySQL [
23
] and
evaluated it using both industry-standard benchmarks and a pro-
duction workload. The results show that PilotDB achieves up to
98.0% of the throughput of a monolithic conguration (which is
given sucient local DRAM and PM-based storage), even with the
vast majority of its data placed on remote, disaggregated PM. With
most workloads, PilotDB signicantly outperforms LegoBase [
74
],
a state-of-the-art DRAM-disaggregated cloud-native database, and
LegoPM, a solution incorporating PM disaggregation. In addition,
we made a best-eort attempt to compare PilotDB with Aurora
and PolarDB, two mainstream cloud-native database services on
the market that adopt storage disaggregation, by allocating Au-
rora/PolarDB instances with sucient local DRAM (and careful
hardware alignment in other resource dimensions). Results show
PilotDB achieves signicantly better or comparable performance.
In addition to the above performance results, our multi-tenant
tests show that PilotDB has strong service scalability, with a 4-node
PM pool serving 32 concurrent DB instances at only a 10.8% per-
formance loss against running each instance exclusively. Moreover,
PilotDB brings instant failure recovery, up to 15.27
×
faster than the
baselines, regardless of the crash site. Finally, our cost analysis fur-
ther conrms the cost-eectiveness of PilotDB. Compared with its
closest competitor in cost-eectiveness, the PilotDB conguration
is 38.3% lower in hardware ownership cost, uses only 9.1% DRAM
across CN and PMN, and 12.5% PMN’s CPU core resources, while
delivering 91.5% higher throughput per dollar.
To our knowledge, PilotDB is the only database design that
leverages all major features of PM for disaggregation: capacity,
persistence, and RDMA-based low-latency remote accesses. Our
research contributions are as follows:
•
We advocate a exible 3-level cloud-native database architec-
ture with aggressively disaggregated resources. It makes CNs
评论