暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Zen: a High-Throughput Log-Free OLTP Engine for Non-Volatile Main Memory
333
14页
2次
2021-11-08
免费下载
Zen: a High-Throughput Log-Free OLTP Engine
for Non-Volatile Main Memor y
Gang Liu, Leying Chen, Shimin Chen
SKL of Computer Architecture, ICT, CAS
University of Chinese Academy of Sciences
{liugang,chenleying,chensm}@ict.ac.cn
ABSTRACT
Emerging
N
on-
V
olatile
M
emory (NVM) technologies like 3DX-
point promise signicant performance potential for OLTP databases.
However, transactional databases need to be redesigned because
the key assumptions that non-volatile storage is orders of magni-
tude slower than DRAM and only supports blocked-oriented access
have changed. NVMs are byte-addressable and almost as fast as
DRAM. The capacity of NVM is much (4-16x) larger than DRAM.
Such NVM characteristics make it possible to build OLTP database
entirely in NVM main memory.
This paper studies the structure of OLTP engines with hybrid
NVM and DRAM memory. We observe three challenges to design
an OLTP engine for NVM: tuple metadata modications, NVM
write redundancy, and NVM space management. We propose Zen,
a high-throughput log-free OLTP engine for NVM. Zen addresses
the three design challenges with three novel techniques: metadata
enhanced tuple cache, log-free persistent transactions, and light-
weight NVM space management. Experimental results on a real
machine equipped with Intel Optane DC Persistent Memory show
that Zen achieves up to 10.1x improvement compared with existing
solutions to run an OLTP database as large as the size of NVM
while achieving fast failure recovery.
PVLDB Reference Format:
Gang Liu, Leying Chen, Shimin Chen. Zen: a High-Throughput Log-Free
OLTP Engine for Non-Volatile Main Memory. PVLDB, 14(5): 835 - 848, 2021.
doi:10.14778/3446095.3446105
1 INTRODUCTION
Byte-addressable, non-volatile memory (NVM) is a new type of
memory technology designed to address the DRAM scaling prob-
lem [
1
,
3
,
18
,
29
,
39
]. NVM delivers a unique combination of near-
DRAM speed, lower-than-DRAM power consumption, aordable
large (up to 6TB in a dual-socket server) memory capacity, and
non-volatility in light of power failure. By eliminating disk I/Os,
NVM can substantially improve the performance of systems with
persistence requirement. Therefore, OLTP databases using NVM as
primary storage is emerging as a promising design choice [
5
,
6
,
20
].
Recent studies in concurrency control methods have advanced
the single-machine main memory OLTP transaction throughput
Shimin Chen is the corresponding author.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 14, No. 5 ISSN 2150-8097.
doi:10.14778/3446095.3446105
(without persistence) to over one million transactions per sec-
ond [
15
,
24
,
27
,
32
,
36
,
41
]. However, replacing DRAM with NVM
in a system tends to slow down the system because NVM performs
modestly (e.g., 2–3x) slower than DRAM, NVM writes have lower
bandwidth than reads, and persisting writes from CPU cache to
NVM incurs extra overhead. In this paper, we would like to rethink
the design of the OLTP engine for NVM by fully considering NVM’s
characteristics. Our goal is to achieve transaction performance sim-
ilar to those of pure DRAM based OLTP engines.
We observe three main challenges in achieving our goal:
Tuple Metadata Modications
: Concurrency control methods
typically keep a small amount of metadata per tuple in a main
memory OLTP engine [
24
,
27
,
32
,
36
,
41
]. The per-tuple metadata
is often modied not only by tuple writes but also by tuple reads.
As a result, tuple reads in an NVM based OLTP engine can incur
expensive NVM writes.
NVM Write Redundancy
: OLTP databases typically rely on
logs and checkpoints/snapshots to achieve durability. If an NVM
based engine takes this approach, there will be substantial NVM
write redundancy because the same content is written to the logs,
the checkpoints/snapshots, in addition to the base tables. This re-
dundancy not only takes more NVM space, but also negatively
impacts the runtime performance.
NVM Space Management
: First, NVM space allocation needs
to be persistent across power failure. Hence, every NVM memory
allocation and free may have to be protected by expensive NVM
persistence operations. Unfortunately, OLTP transactions often
perform non-trivial numbers of inserts, updates, and/or deletes,
potentially incurring signicant overhead. Second, NVM may have
limited write endurance [
29
]. It is important yet challenging to
remove hot spots in the NVM frequently allocated and freed.
In this paper, we propose Zen, a high-throughput log-free OLTP
engine for NVM. Zen addresses the above three challenges with
the following three new techniques. It provides general-purpose
support for a wide range of concurrency control methods.
Metadata Enhanced Tuple Cache
: We store base tables in
NVM without per-tuple metadata. Then we propose to build an Met-
Cache (Metadata enhanced tuple Cache) in DRAM to (i) cache tuples
that are used in currently running transactions or have recently
been used, and (ii) augment each tuple with per-tuple metadata
required by concurrency control methods. In this way, Zen performs
concurrency control mostly in DRAM, avoiding writing per-tuple
metadata in NVM for tuple reads, and reduces NVM reads for
frequently accessed tuples.
Log-Free Persistent Transactions
: We eliminate NVM write
redundancy by completely removing logs and checkpoints for trans-
actions in our durability scheme. Each tuple in the base tables in
835
NVM has a tuple ID eld and a Tx-CTS (Transaction Commit Times-
tamp) eld. Tx-CTS identies the transaction that produces the
version of the tuple. At commit time, Zen persists modied tuples
in a transaction from the Met-Cache to the relevant base tables
in NVM. It writes to newly allocated or garbage collected space
without overwriting the previous versions of the tuples. The most
signicant bit in Tx-CTS is used as a LP (Last Persisted) bit. After
persisting the set of modied tuples in a transaction, Zen sets the LP
bit and persists the Tx-CTS for the last tuple in the set. Upon failure
recovery, Zen can identify if the modication of a transaction is
fully persisted by checking if the LP bit is set for one of the tuples.
If yes, then the new tuple versions will be the current versions. If
no, then the transaction is considered as aborted, and the previous
tuple versions are used.
Lightweight NVM Space Management
: We aim to reduce the
persistence operations for NVM space management as much as pos-
sible. First, we allocate large (2MB sized) chunks of NVM memory
from the underlying system, and initialize the NVM memory so that
Tx-CTS=0. Second, we manage tuple allocation and free without
performing any persistence operations. This is because using the
log-free persistence mechanism, Zen can identify the tuple versions
that are most recently committed upon recovery. The old tuple ver-
sions are then put into the free lists. Third, the allocation structures
are maintained in DRAM during normal processing. Zen garbage
collects old tuple versions and puts them into free lists for tuple
allocations. Each thread has its own allocation structures to avoid
thread synchronization overhead.
The contributions of this paper are fourfold. First, we identify
the main design principles for NVM based OLTP engines by exam-
ining the strengths and weaknesses of three state-of-the-art NVM
based OLTP designs (§2). Second, we propose Zen, which reduces
NVM overhead by three novel techniques, namely the Met-Cache,
log-free persistent transactions, and light-weight NVM space man-
agement (§3 and §5). The three techniques push to the extreme
of minimizing NVM writes: for every tuple write, the only NVM
write is for the modied tuple itself. Third, we evaluate the runtime
and recovery performance of Zen using YCSB and TPCC bench-
marks on a real machine equipped with Intel Optane DC Persistent
Memory. Experimental results show that Zen achieves up to 10.1x
improvements over MMDB with NVM capacity, WBL, and FOEDUS,
while obtaining almost instant recovery (§4). Finally, we prove the
wide applicability of Zen by supporting 10 dierent concurrency
control methods (§3 and §4).
2 BACKGROUND AND MOTIVATION
We provide background on NVM and OLTP, examine existing OLTP
engine designs for NVM, then discuss the design challenges.
2.1 NVM Characteristics
There are several competing NVM technologies, including PCM [
29
],
STT-RAM [
39
], Memristor [
3
], and 3DXPoint [
1
,
18
]. They share
similar characteristics: (i) NVM is byte-addressable like DRAM; (ii)
NVM is modestly (e.g., 2–3x) slower than DRAM, but orders of mag-
nitude faster than HDDs and SSDs; (iii) NVM provides non-volatile
main memory that can be much larger (e.g., up to 6TB in a dual-
socket server) than DRAM; (iv) NVM writes have lower bandwidth
than NVM reads; (v) To ensure that data is consistent in NVM upon
power failure, special persistence operations using cache line ush
and memory fence instructions (e.g.,
clwb
and
sfence
) are required
to persist data from the volatile CPU cache to NVM, incurring sig-
nicantly higher overhead than normal writes; and (vi) NVM cells
may wear out after a limited number (e.g., 10
8
) of writes.
From previous work on NVM based data structures and sys-
tems [
4
6
,
9
12
,
17
,
19
,
20
,
25
,
26
,
28
,
33
35
,
37
,
38
], we obtain
three common design principles: (i) Put frequently accessed data
structures in DRAM if they are either transient or can be recon-
structed upon recovery; (ii) Reduce NVM writes as much as possible;
(iii) Reduce persistence operations as much as possible. We would
like to apply these design principles to the OLTP engine design.
2.2 OLTP in Main Memory Databases
Main memory OLTP systems are the starting point to design an
OLTP engine for NVM. We consider concurrency control and crash
recovery mechanisms for achieving ACID transaction support.
Recent work has investigated concurrency control methods for
high-throughput main memory transactions [
15
,
24
,
27
,
32
,
36
,
41
].
Instead of using two phase locking (2PL) [
7
,
16
], which is the stan-
dard method in traditional disk-oriented databases, main memory
OLTP designs exploit optimistic concurrency control (OCC) [
21
]
and multi-version concurrency control (MVCC) [
7
] for higher per-
formance. Silo [
32
] enhances OCC with epoch-based batch times-
tamp generation and group commit. MOCC [
36
] is an OCC based
method that exploits locking mechanisms to deal with high conicts
for hot tuples. Tictoc [
41
] removes the bottleneck of centralized
timestamp allocation in OCC and computes transaction timestamps
lazily at commit time. Hekaton [
15
] employs latch-free data struc-
tures and MVCC for transactions in memory. Hyper [
27
] improves
MVCC for read-heavy transactions in column stores by performing
in-place updates and storing before-image deltas in undo buers. Ci-
cada [
24
] reduces overhead and contention of MVCC with multiple
loosely synchronized clocks for generating timestamps, best-eort
inlining to decrease cache misses, and optimized multi-version val-
idation. One common feature of the above methods is that they
extend every tuple or every version of a tuple with metadata, such
as read/write timestamps, pointers to dierent tuple versions, and
lock bits for validation and commit processing. These methods have
achieved transaction throughputs of over one million transactions
per second (TPS) without persistence.
Similar to traditional databases, main memory databases (MMDB)
store logs and checkpoints on durable storage (e.g., HDDs, SSDs)
in order to achieve durability [
8
,
14
,
22
,
23
,
30
,
43
]. The main dier-
ence resides in the fact that all the data ts into main memory in
MMDBs. Hence, only committed states and redo logs need to be
written to disks. After a crash, an MMDB recovers by loading the
most recent checkpoint from durable storage into main memory,
then reading and applying the redo log up to the crash point.
2.3 Existing OLTP Engine Designs for NVM
In this paper, we focus on the case where all data and structures of
the OLTP engine can t into NVM memory. We assume that the
computer system contains both NVM and DRAM memory, which
are mapped to dierent address ranges in the virtual memory of
836
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜