Zen: a High-Throughput Log-Free OLTP Engine for Non-Volatile Main Memory

张凡

333

14页

2次

2021-11-08

免费下载

Zen: a High-Throughput Log-Free OLTP Engine

for Non-Volatile Main Memor y

Gang Liu, Leying Chen, Shimin Chen

∗

SKL of Computer Architecture, ICT, CAS

University of Chinese Academy of Sciences

{liugang,chenleying,chensm}@ict.ac.cn

ABSTRACT

Emerging

on-

olatile

emory (NVM) technologies like 3DX-

point promise signicant performance potential for OLTP databases.

However, transactional databases need to be redesigned because

the key assumptions that non-volatile storage is orders of magni-

tude slower than DRAM and only supports blocked-oriented access

have changed. NVMs are byte-addressable and almost as fast as

DRAM. The capacity of NVM is much (4-16x) larger than DRAM.

Such NVM characteristics make it possible to build OLTP database

entirely in NVM main memory.

This paper studies the structure of OLTP engines with hybrid

NVM and DRAM memory. We observe three challenges to design

an OLTP engine for NVM: tuple metadata modications, NVM

write redundancy, and NVM space management. We propose Zen,

a high-throughput log-free OLTP engine for NVM. Zen addresses

the three design challenges with three novel techniques: metadata

enhanced tuple cache, log-free persistent transactions, and light-

weight NVM space management. Experimental results on a real

machine equipped with Intel Optane DC Persistent Memory show

that Zen achieves up to 10.1x improvement compared with existing

solutions to run an OLTP database as large as the size of NVM

while achieving fast failure recovery.

PVLDB Reference Format:

Gang Liu, Leying Chen, Shimin Chen. Zen: a High-Throughput Log-Free

OLTP Engine for Non-Volatile Main Memory. PVLDB, 14(5): 835 - 848, 2021.

doi:10.14778/3446095.3446105

1 INTRODUCTION

Byte-addressable, non-volatile memory (NVM) is a new type of

memory technology designed to address the DRAM scaling prob-

lem [

]. NVM delivers a unique combination of near-

DRAM speed, lower-than-DRAM power consumption, aordable

large (up to 6TB in a dual-socket server) memory capacity, and

non-volatility in light of power failure. By eliminating disk I/Os,

NVM can substantially improve the performance of systems with

persistence requirement. Therefore, OLTP databases using NVM as

primary storage is emerging as a promising design choice [

Recent studies in concurrency control methods have advanced

the single-machine main memory OLTP transaction throughput

∗

Shimin Chen is the corresponding author.

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 14, No. 5 ISSN 2150-8097.

doi:10.14778/3446095.3446105

(without persistence) to over one million transactions per sec-

ond [

]. However, replacing DRAM with NVM

in a system tends to slow down the system because NVM performs

modestly (e.g., 2–3x) slower than DRAM, NVM writes have lower

bandwidth than reads, and persisting writes from CPU cache to

NVM incurs extra overhead. In this paper, we would like to rethink

the design of the OLTP engine for NVM by fully considering NVM’s

characteristics. Our goal is to achieve transaction performance sim-

ilar to those of pure DRAM based OLTP engines.

We observe three main challenges in achieving our goal:

Tuple Metadata Modications

: Concurrency control methods

typically keep a small amount of metadata per tuple in a main

memory OLTP engine [

]. The per-tuple metadata

is often modied not only by tuple writes but also by tuple reads.

As a result, tuple reads in an NVM based OLTP engine can incur

expensive NVM writes.

NVM Write Redundancy

: OLTP databases typically rely on

logs and checkpoints/snapshots to achieve durability. If an NVM

based engine takes this approach, there will be substantial NVM

write redundancy because the same content is written to the logs,

the checkpoints/snapshots, in addition to the base tables. This re-

dundancy not only takes more NVM space, but also negatively

impacts the runtime performance.

NVM Space Management

: First, NVM space allocation needs

to be persistent across power failure. Hence, every NVM memory

allocation and free may have to be protected by expensive NVM

persistence operations. Unfortunately, OLTP transactions often

perform non-trivial numbers of inserts, updates, and/or deletes,

potentially incurring signicant overhead. Second, NVM may have

limited write endurance [

]. It is important yet challenging to

remove hot spots in the NVM frequently allocated and freed.

In this paper, we propose Zen, a high-throughput log-free OLTP

engine for NVM. Zen addresses the above three challenges with

the following three new techniques. It provides general-purpose

support for a wide range of concurrency control methods.

Metadata Enhanced Tuple Cache

: We store base tables in

NVM without per-tuple metadata. Then we propose to build an Met-

Cache (Metadata enhanced tuple Cache) in DRAM to (i) cache tuples

that are used in currently running transactions or have recently

been used, and (ii) augment each tuple with per-tuple metadata

required by concurrency control methods. In this way, Zen performs

concurrency control mostly in DRAM, avoiding writing per-tuple

metadata in NVM for tuple reads, and reduces NVM reads for

frequently accessed tuples.

Log-Free Persistent Transactions

: We eliminate NVM write

redundancy by completely removing logs and checkpoints for trans-

actions in our durability scheme. Each tuple in the base tables in

835

NVM has a tuple ID eld and a Tx-CTS (Transaction Commit Times-

tamp) eld. Tx-CTS identies the transaction that produces the

version of the tuple. At commit time, Zen persists modied tuples

in a transaction from the Met-Cache to the relevant base tables

in NVM. It writes to newly allocated or garbage collected space

without overwriting the previous versions of the tuples. The most

signicant bit in Tx-CTS is used as a LP (Last Persisted) bit. After

persisting the set of modied tuples in a transaction, Zen sets the LP

bit and persists the Tx-CTS for the last tuple in the set. Upon failure

recovery, Zen can identify if the modication of a transaction is

fully persisted by checking if the LP bit is set for one of the tuples.

If yes, then the new tuple versions will be the current versions. If

no, then the transaction is considered as aborted, and the previous

tuple versions are used.

Lightweight NVM Space Management

: We aim to reduce the

persistence operations for NVM space management as much as pos-

sible. First, we allocate large (2MB sized) chunks of NVM memory

from the underlying system, and initialize the NVM memory so that

Tx-CTS=0. Second, we manage tuple allocation and free without

performing any persistence operations. This is because using the

log-free persistence mechanism, Zen can identify the tuple versions

that are most recently committed upon recovery. The old tuple ver-

sions are then put into the free lists. Third, the allocation structures

are maintained in DRAM during normal processing. Zen garbage

collects old tuple versions and puts them into free lists for tuple

allocations. Each thread has its own allocation structures to avoid

thread synchronization overhead.

The contributions of this paper are fourfold. First, we identify

the main design principles for NVM based OLTP engines by exam-

ining the strengths and weaknesses of three state-of-the-art NVM

based OLTP designs (§2). Second, we propose Zen, which reduces

NVM overhead by three novel techniques, namely the Met-Cache,

log-free persistent transactions, and light-weight NVM space man-

agement (§3 and §5). The three techniques push to the extreme

of minimizing NVM writes: for every tuple write, the only NVM

write is for the modied tuple itself. Third, we evaluate the runtime

and recovery performance of Zen using YCSB and TPCC bench-

marks on a real machine equipped with Intel Optane DC Persistent

Memory. Experimental results show that Zen achieves up to 10.1x

improvements over MMDB with NVM capacity, WBL, and FOEDUS,

while obtaining almost instant recovery (§4). Finally, we prove the

wide applicability of Zen by supporting 10 dierent concurrency

control methods (§3 and §4).

2 BACKGROUND AND MOTIVATION

We provide background on NVM and OLTP, examine existing OLTP

engine designs for NVM, then discuss the design challenges.

2.1 NVM Characteristics

There are several competing NVM technologies, including PCM [

STT-RAM [

], Memristor [

], and 3DXPoint [

]. They share

similar characteristics: (i) NVM is byte-addressable like DRAM; (ii)

NVM is modestly (e.g., 2–3x) slower than DRAM, but orders of mag-

nitude faster than HDDs and SSDs; (iii) NVM provides non-volatile

main memory that can be much larger (e.g., up to 6TB in a dual-

socket server) than DRAM; (iv) NVM writes have lower bandwidth

than NVM reads; (v) To ensure that data is consistent in NVM upon

power failure, special persistence operations using cache line ush

and memory fence instructions (e.g.,

clwb

and

sfence

) are required

to persist data from the volatile CPU cache to NVM, incurring sig-

nicantly higher overhead than normal writes; and (vi) NVM cells

may wear out after a limited number (e.g., 10

) of writes.

From previous work on NVM based data structures and sys-

tems [

–

], we obtain

three common design principles: (i) Put frequently accessed data

structures in DRAM if they are either transient or can be recon-

structed upon recovery; (ii) Reduce NVM writes as much as possible;

(iii) Reduce persistence operations as much as possible. We would

like to apply these design principles to the OLTP engine design.

2.2 OLTP in Main Memory Databases

Main memory OLTP systems are the starting point to design an

OLTP engine for NVM. We consider concurrency control and crash

recovery mechanisms for achieving ACID transaction support.

Recent work has investigated concurrency control methods for

high-throughput main memory transactions [

Instead of using two phase locking (2PL) [

], which is the stan-

dard method in traditional disk-oriented databases, main memory

OLTP designs exploit optimistic concurrency control (OCC) [

]

and multi-version concurrency control (MVCC) [

] for higher per-

formance. Silo [

] enhances OCC with epoch-based batch times-

tamp generation and group commit. MOCC [

] is an OCC based

method that exploits locking mechanisms to deal with high conicts

for hot tuples. Tictoc [

] removes the bottleneck of centralized

timestamp allocation in OCC and computes transaction timestamps

lazily at commit time. Hekaton [

] employs latch-free data struc-

tures and MVCC for transactions in memory. Hyper [

] improves

MVCC for read-heavy transactions in column stores by performing

in-place updates and storing before-image deltas in undo buers. Ci-

cada [

] reduces overhead and contention of MVCC with multiple

loosely synchronized clocks for generating timestamps, best-eort

inlining to decrease cache misses, and optimized multi-version val-

idation. One common feature of the above methods is that they

extend every tuple or every version of a tuple with metadata, such

as read/write timestamps, pointers to dierent tuple versions, and

lock bits for validation and commit processing. These methods have

achieved transaction throughputs of over one million transactions

per second (TPS) without persistence.

Similar to traditional databases, main memory databases (MMDB)

store logs and checkpoints on durable storage (e.g., HDDs, SSDs)

in order to achieve durability [

]. The main dier-

ence resides in the fact that all the data ts into main memory in

MMDBs. Hence, only committed states and redo logs need to be

written to disks. After a crash, an MMDB recovers by loading the

most recent checkpoint from durable storage into main memory,

then reading and applying the redo log up to the crash point.

2.3 Existing OLTP Engine Designs for NVM

In this paper, we focus on the case where all data and structures of

the OLTP engine can t into NVM memory. We assume that the

computer system contains both NVM and DRAM memory, which

are mapped to dierent address ranges in the virtual memory of

836

of 14

免费下载

论文

关注

评论