
adjacent stages. Recently, PM-aware LSM-trees [12, 30, 62] are de-
signed to optimize for PM idiosyncrasy and reduce write stalls.
However, both read and write performances of such multi-stage
indexing KVSs remain far lower than DRAM-only single-stage
indexing KVSs due to limited write-concurrency (Observation 4).
In addition, both single-stage and multi-stage indexing KVSs still
insuciently utilize the bandwidth of PM (Observation 5).
In other words, it is very challenging for existing KVS indexing
structures, single-stage or multi-stage, to achieve high write/read
throughput and controllable DRAM footprint simultaneously. For-
tunately, Observation 5 oers a hint to eectively and eciently
leverage the power of modern hardware. To this end, we propose a
fast-owing three-stage KVS architecture, FluidKV, to seamlessly
consolidate such two indexing mechanisms. The key idea behind
FluidKV is to combine the high concurrency of single-stage indexes
and the controllable DRAM consumption of multi-stage indexes by
quickly and eciently merging KVs from the former to the latter.
FluidKV comprises three consecutive processing stages, i.e., Fast-
Store, BuerStore, and StableStore. FastStore employs a concurrent
and key-grained index, along with thread-exclusive logging, to
fast absorb incoming KVs in a sequential but unsorted manner,
thus achieving high write performance. Adapting to available hard-
ware bandwidth and uctuating workload, BuerStore dynamically
ushes FastStore data into a series of persisted and sorted KV-
sets and merge-sorts them into the backend StableStore, to control
the memory footprint in time. StableStore maintains a global key-
range-grained index on large-scale persistent data, thus minimizing
the DRAM footprint while maintaining read performance. FluidKV
presents a PM-friendly index and data block structure in BuerStore
and StableStore to store the persisted index and KVs to eciently
exploit the ultra-fast storage.
We implement a FluidKV prototype and evaluate it under a
variety of workloads. The results show that FluidKV outperforms
ListDB, a state-of-the-art PM-aware multi-stage indexing KVS, by
up to 9x and 3.8x in write and read throughput respectively while
cutting 90% of ListDB’s DRAM footprint. Compared to state-of-the-
art single-stage indexing KVSs, FluidKV also ensures a controllable
DRAM footprint with similar or higher read/write performance.
The contributions of this paper include:
•
An in-depth experimental analysis of the performance/memory-
footprint gap among representative KVS indexing mechanisms;
•
A three-stage fast-owing KVS architecture that eectively uti-
lizes the I/O capacity and parallel-processing capability of ultra-
fast storage;
•
A write-optimized rst stage (FastStore), an adaptive second
stage for fast data migration (BuerStore), and a DRAM-footprint-
cutting PM-aware backend third stage (StableStore);
•
Evaluation of a FluidKV prototype against representative PM-
aware KVSs demonstrating high write performance, low DRAM
footprint, and acceptable read performance.
2 BACKGROUND AND ANALYSIS
In this section, we provide the necessary background for and an in-
depth analysis of the characteristics of emerging high-performance
storage devices and indexing mechanisms of existing key-value
stores (KVSs), which help reveal their performance pitfalls.
Key-value data
DRAM
PM
Volatile
index
(a) Single-stage indexing
DRAM
PM
Index
Key-value data
Key-value
Key-value data
Key-value data
Stage 0
Stage 1
Stage 2
Stage n
…
flush
compaction
(b) Multi-stage indexing
Figure 1: Classication of KVS indexing mechanism.
2.1 Ultra-Fast Storage
The rapid development of solid-state drive (SSD) and persistent
memory (PM) technologies has unceasingly advanced both stor-
age capacity and performance, especially accelerating bandwidth
with increasing parallelism. Dierent from traditional block devices
such as SSD and hard-disk drives (HDD), PM (e.g., PCM [
52
], STT-
MRAM [
2
], Memristor [
59
], 3DXPoint[
23
], Memory-Semantic SSD
[
24
,
25
,
42
,
60
]) is capable of memory semantics (i.e., load/store) and
accesses to byte-level small-sized data at GB/s-level I/O bandwidth
and 100ns-level latency. Because of the superior byte-addressing
capability and fast persistency, KVSs as a fundamental building
block of modern data-processing infrastructure leverage PM to ac-
celerate intensive small-sized KV workloads [
16
,
27
,
29
,
61
] that are
prevalent in industrial and commercial applications [
5
]. Nowadays,
PM not only works as main memory, which can be accessed via
memory bus for low latency, but can also be attached on the PCIe
bus with the emerging CXL technology [1] for high scalability.
2.2 Key-value Store Indexing
A KVS generally consists of indexes and persistent data. In this
paper, we focus on the index structure, which is the core of KVS
for accurate and quick access to the persistent data on storage. To
understand the impact of dierent indexing mechanisms on the
overall performance of a KVS, we rst classify the existing KVS
indexes into two groups, i.e., single-stage indexing (e.g., B+-tree)
and multi-stage indexing (e.g., LSM-tree) as shown in Figure 1, and
identify their respective performance characteristics and pitfalls
(Observation 1~5) through experiments.
2.2.1 Single-stage indexing. A single-stage indexing KVS main-
tains a monolithic KV-grained index to precisely record the location
of each KV in the persistent data, as shown in Figure 1a. Common
single-stage indexes include range indexes (e.g., B-tree variants,
trie, and skiplist) and hash indexes. Considering that KVSs require
support for range queries, this paper focuses only on range indexes.
Most KVSs optimized for ultra-fast storage keep the whole index
in DRAM to prevent the indexing from becoming a bottleneck. For
example, KVell [
32
] adopts a large B+-tree index and page cache
in memory to ensure read performance on fast SSDs. Flatstore [9]
builds an ecient multi-log structure on PM for persistent KVs and
employs an existing volatile index for fast searching.
To demonstrate the impact of single-stage indexing on KVS per-
formance, we test the read and write performances of Flatstore with
dierent numbers of user threads under workloads of 200M writes
and reads respectively. The sizes of key and value are equal and
8 bytes each (see §6.1 for detailed experimental setup). As shown
in Figure 2, Masstree, which denotes Flatstore with a DRAM-only
B+-tree (i.e., Masstree [
38
]), achieves read and write throughputs
1378
文档被以下合辑收录
评论