暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
X-SSD:A Storage System with Native Support for Database Logging and Replication.pdf
175
15页
4次
2022-07-20
免费下载
X-SSD: A Storage System with Native Support for
Database Logging and Replication
Sangjin Lee
Hanyang University
Republic of Korea
Alberto Lerner
University of Fribourg
Switzerland
André Ryser
University of Fribourg
Switzerland
Kibin Park
Hanyang University
Republic of Korea
Chanyoung Jeon
Hanyang University
Republic of Korea
Jinsub Park
Hanyang University
Republic of Korea
Yong Ho Song
Hanyang University &
Samsung Electronics
Republic of Korea
Philippe
Cudré-Mauroux
University of Fribourg
Switzerland
ABSTRACT
Transaction logging and log shipping are standard techniques to
provide recoverability and high availability in data management
systems. They entail an update to a local log le and a remote site at
every transaction. Modern databases have leveraged technologies
such as Persistent Memory (PM) and RDMA-enabled networking
to perform these updates as fast as possible. This mix of technolo-
gies, however, presents several drawbacks: lack of portability, the
complexity of the data path, and interoperability.
To address these issues, this paper introduces the X-SSD, a new
SSD architecture that mixes NAND Flash and PM memory classes.
A X-SSD device can take transaction log writes on a fast, PM-backed
data path and be responsible for propagating the operation to re-
mote sites and eventually to NAND Flash storage. We design and
implement an actual reference X-SSD device called Villars to vali-
date this new architecture. Our experiments show that the Villars
device can oer a more straightforward and robust way to manage
PM on behalf of the database and achieve equally fast results.
CCS CONCEPTS
Information systems
Database management system en-
gines; Storage architectures.
KEYWORDS
database-storage codesign, write-ahead log, database replication
ACM Reference Format:
Sangjin Lee, Alberto Lerner, André Ryser, Kibin Park, Chanyoung Jeon,
Jinsub Park, Yong Ho Song, and Philippe Cudré-Mauroux. 2022. X-SSD: A
Storage System with Native Support for Database Logging and Replication.
In Proceedings of the 2022 International Conference on Management of Data
(SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. ACM, New York, NY,
USA, 15 pages. https://doi.org/10.1145/3514221.3526188
The author performed most of the work while visiting the University of Fribourg.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9249-5/22/06... $15.00
https://doi.org/10.1145/3514221.3526188
1 INTRODUCTION
Database replication is often performed by copying transactions’
changes into a secondary site before committing these changes
into local storage [
41
,
62
]. Such a mechanism is called (transaction)
log shipping and is present almost universally in databases that
oer replication, (e.g., [
3
,
56
,
63
]). If the primary database site goes
down, the secondary one can serve as a hot backup, as it caught
up with all the primary database changes. Achieving this level of
robustness, however, comes at a cost. Transaction logging and log
shipping require writing to storage and exchanging data over the
network, both relatively expensive operations.
Two technologies reached maturity recently that can be rele-
vant in this scenario. The rst one is Persistent Memory (PM), and
more specically, PM in a DIMM form factor that replaces server
memory and can be accessed by an application via
load
and
store
instructions. PM comes in many avors such as Intel Optane [
31
]
or battery-backed DRAM [
16
]
1
. Optane class PM has for instance
proved to be useful in mixed memory indices [
4
,
50
,
57
], and can of-
fer alternative ways to build a database system [
5
]. Battery-backet
class PM behaves as regular DRAM but is not volatile. The sec-
ond technology is RDMA-enabled networks [
30
]. These networks
transport data with negligible overhead and have been useful, for
instance, in query execution [
22
,
49
,
64
]. Just as with PM, RDMA-
enabled networks have also fostered new database designs [11].
PM and RDMA-enabled networks can also help to record and
propagate transaction log updates [
75
,
78
]. In particular, we con-
sider the case of Main-Memory Databases [
19
]. They can reach
unprecedented performance levels because they maintain all their
data in DRAM and persist only the transaction log, which there-
fore becomes their main bottleneck [
51
]. Figure 1 (left) depicts
how a typical system can perform log writing and shipping with
PM and RDMA. We can observe in the gure that the database
system is responsible for coordinating several dierent steps, some-
times targeting local PM, sometimes remote PM or memory via an
RDMA-enabled NIC, and lastly, fast SSD devices.
Each of these technologies oers a specic API and presents
some restrictions. The combination of these restrictions creates a
number of issues, including:
The interaction of RDMA and PM is complex and poorly under-
stood. For example, using RDMA to update a PM-backed address
on a remote machine may make the update visible, but it does not
1
The JEDEC standard, which supports DRAM interoperability, calls these NVDIMM-P
and NVDIMM-F types of persistent memory, respectively [23].
Session 14: Modern Hardware and In-memory DBMS
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
988
CPU
mem ctrl
PCIe system
Cores /
Caches
DB
NIC
NVM
ctrl
SSD
MEM
CPU
mem ctrl
PCIe system
Cores /
Caches
DB
NIC
X-SSD
PM
MEM
mem ctrl
PCIe system
Cores /
Caches
DB
NIC
NVM
ctrl
SSD
MEM
PM
mem ctrl
PCIe system
Cores /
Caches
DB
NIC
X-SSD
MEM
(1)
(2)
(3)
(4a)
(4b)
(1)
(2)
(3)
(4a)
(4b)
Figure 1: (Left) Logging and replication path using PM and RDMA. (1) The database writes log data into the PM. (2) It then ships
the data to remote PM via RDMA. (3) It uses a se cond RDMA operation to make the change the log describ es into the remote
host’s memory (e.g., using Active-Memory techniques [
78
]). Eventually, both hosts will need to make space on PM. (4a/b) They
do so by copying some of its contents into an SSD. (Right) Logging and replication path using a X-SSD device. The sequence of
steps is the same, but the X-SSD device takes responsibility for propagating data in steps (2) and (4a/b), while the update of the
remote memory is done by the remote Database (3).
guarantee that that update is persistent [
37
]. If a machine crashes,
a replication operation’s correctness can be compromised.
While PM can be accessed with simple
load
/
store
memory
instructions, programming correct, persistent data structures is a
daunting endeavor. A software crash can leave a structure in an
arbitrary state, from which the database then needs to recover.
Every DIMM slot used for PM is not used for DRAM. This forces
the system designers to choose between DRAM or PM capacity.
Optane and battery-backed DRAM require specic server support
and cannot be ported across servers without certain characteris-
tics. Optane, in particular, is not supported on AMD platforms.
To address these issues, this paper presents a new SSD design
that allows database logging and replication to benet from PM and
fast networking, but without the above drawbacks. In summary,
our design is based on a deceptively simple decision: it moves PM
out of the CPU path and into the SSD, and it lets the latter manage
the access to PM, locally or remotely, on behalf of the database.
Specically, we devise a new storage architecture that contains PM-
and NAND Flash-based storage that is natively networked. The
architecture provides a separate, fast data path and interface fully
dedicated to transaction log writes and oers Data Propagation
Services, including across servers, upon which database replication
can be built. We call our storage architecture the X-SSD
2
. Figure 1
(right) shows how the logging and replication data paths can be
simplied with a X-SSD device.
Moving PM into a X-SSD device frees DIMM slots for DRAM and
restores the ability to deploy PM on vendor-independent server-
class machines without special-purpose DIMM slots or battery-
backing features. One can then use PM on an AMD server simply
by plugging in this new NVMe device. The device also avoids in-
teroperability problems between RDMA and remote PM. These
problems occur because the RDMA writes may be routed to the
CPU caches in a process called Data-Direct IO (DDIO) [
21
], before
they reach the PM. Our system can resort to low-level mechanisms
to request the NIC to deliver messages directly to storage, which
would be impractical at the application level.
2
The ’X’ stands for “cross” for reasons that will become apparent shortly.
We design and implement a X-SSD device using an actual SSD
prototyping platform [
44
]. We call this device Villars. The Villars
device is fully compatible with the NVMe standard [
55
], the de facto
standard for fast SSDs, even with our extensions. The Villars is
the rst in what we expect to be a series of X-SSD devices with
increasing application functionality. We also provide a set of drop-
in system call replacements, e.g,
pwrite()
, that make it easy to
convert existing code to use our device. These syscall replacements
can detect, with low overhead, when a previous write is persistent
or is in-processing within a local device or a remote device, allowing
the database to implement dierent replication avours.
In summary, we make the following contributions:
We propose a new SSD architecture that mixes PM and NAND
Flash storage in a way that naturally matches transaction logging
and replication behavior (§ 3).
We describe a reference design of our architecture that presents
precise durability semantics to the application (§ 4).
We describe how to integrate data propagation services in an
existing database as well as suggest how a new database can
explore alternative designs (§ 5).
We quantify the benets of shifting data movement (across mem-
ory types and local and remote servers) to the storage, freeing
application cycles in the process (§ 6).
We present several use cases that can benet from X-SSD devices’
existing and potential features (§ 7).
We position our solution relative to the state of the art (§ 8).
Before describing our architecture in detail, we start below by
presenting the background necessary to follow our discussions.
2 BACKGROUND
The X-SSD device extends traditional SSDs with data propagation
services exposed through a new interface. To understand the impli-
cations of such an extension, we revisit how data ows between a
database and a storage device (§ 2.1) and, once it reached the latter,
the path the data follows within a conventional SSD (§ 2.2). Lastly,
we introduce two standard but little-known technologies called
CMB and NTB upon which our architecture is based (§ 2.3).
Session 14: Modern Hardware and In-memory DBMS
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
989
of 15
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜