
ScaleStore: A Fast and Cost-Eicient Storage Engine
using DRAM, NVMe, and RDMA
Tobias Ziegler
Technische Universität
Darmstadt
Carsten Binnig
Technische Universität
Darmstadt
Viktor Leis
Friedrich-Alexander-Universität
Erlangen-Nürnberg
ABSTRACT
In this paper, we propose ScaleStore, a novel distributed storage
engine that exploits DRAM caching, NVMe storage, and RDMA
networking to achieve high performance, cost-eciency, and scala-
bility at the same time. Using low latency RDMA messages, Scale-
Store implements a transparent memory abstraction that provides
access to the aggregated DRAM memory and NVMe storage of all
nodes. In contrast to existing distributed RDMA designs such as
NAM-DB or FaRM, ScaleStore stores cold data on NVMe SSDs
(ash), lowering the overall hardware cost signicantly. The core
of ScaleStore is a distributed caching strategy that dynamically
decides which data to keep in memory (and which on SSDs) based
on the workload. The caching protocol also provides strong consis-
tency in the presence of concurrent data modications. Our evalua-
tion shows that ScaleStore achieves high performance for various
types of workloads (read/write-dominated, uniform/skewed) even
when the data size is larger than the aggregated memory of all
nodes. We further show that ScaleStore can eciently handle
dynamic workload changes and supports elasticity.
CCS CONCEPTS
• Information systems
→
Parallel and distributed DBMSs;
DBMS engine architectures.
KEYWORDS
Distributed Storage Engine, Transaction Processing, Flash, RDMA
ACM Reference Format:
Tobias Ziegler, Carsten Binnig, and Viktor Leis. 2022. ScaleStore: A Fast
and Cost-Ecient Storage Engine using DRAM, NVMe, and RDMA. In
Proceedings of the 2022 International Conference on Management of Data
(SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. ACM, New York, NY,
USA, 15 pages. https://doi.org/10.1145/3514221.3526187
1 INTRODUCTION
In-memory DBMSs. Decades of decreasing main memory prices
have led to the era of in-memory DBMSs. This is reected by
the vast number of academic projects such as MonetDB [
8
], H-
Store [
34
], and HyPer [
37
] as well as commercially-available in-
memory DBMSs such as SAP HANA [
24
], Oracle Exalytics [
26
],
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9249-5/22/06.. .$15.00
https://doi.org/10.1145/3514221.3526187
Table 1: Hardware landscape in terms of cost, latency, BW.
Price
[$/TB]
Read Latency
[𝜇s/4 KB]
Bandwidth
[GB/s]
DRAM 5000 0.1 92.0
Flash SSDs 200 78.0 12.5
RDMA (IB EDR 4x) - 5.0 11.2
and Microsoft Hekaton [
19
]. However, while in-memory DBMSs
are certainly ecient, they also suer from signicant downsides.
Downsides of in-memory DBMSs. An inherent issue of in-
memory DBMSs is that all data must be memory resident. In turn,
this means if data sets grow, larger memory capacities are required.
Unfortunately, DRAM module prices do not increase linearly with
the capacity, for instance, a 64 GB DRAM module is 7 times more
expensive than a 16 GB module [
1
]. Therefore, scaling data beyond
a certain size results in an “explosion” of the hardware cost. More
importantly, since 2012 main memory prices have started to stag-
nate [
27
] – while data set sizes are constantly growing. This is why
research proposed two directions to handle very large data sets.
NVMe storage engines. As a rst direction, a new class of
storage engines [
38
,
51
] has been presented that can leverage NVMe
SSDs (ash) to store (cold) data. As Table 1 (second row) shows,
the price per terabyte of SSD storage is about 25 times cheaper
than the price of main memory. The key idea behind such high
performance storage engines is to redesign buer managers to
cause only minimal overhead on modern hardware in case pages
are cached in memory. This is in stark contrast to a classical buer
manager that suers from high overhead even if data is cache
resident [
28
]. Recent papers [
38
,
51
] have shown that when the
entire working set (aka hot set) ts into memory, the performance
of such storage engines is comparable to pure in-memory DBMSs.
Unfortunately, when the working set is considerably larger than the
memory capacities, the system performance signicantly degrades.
This is because the latency of SSDs is still at least two orders of
magnitude higher than DRAM (see Table 1 second row). This latency
cli mainly aects latency-critical workloads such as OLTP.
In-memory scale-out systems. A second (alternative) direc-
tion to accommodate large data sets is to use scale-out (distributed)
in-memory DBMS designs on top of fast RDMA-capable networks [
7
,
20
,
33
,
43
,
53
]. The main intuition is to scale in-memory DBMSs
beyond the capacities of a single machine by leveraging the aggre-
gated memory capacity of multiple machines. This avoids the cost
explosion that typically arises in scale-up designs. The main obser-
vation is that scale-out systems execute latency-critical transactions
eciently via RDMA. In fact, as shown in Table 1 (third row), the
latency of remote memory access using a recent InniBand net-
work (EDR 4
×
) is one order of magnitude lower than NVMe latency.
As a result, systems such as FaRM [
20
,
21
,
56
] and NAM-DB [
67
]
Session 10: Distributed and Parallel Databases
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
评论