UniKV: Toward High-Performance and Scalable KV Storage
in Mixed Workloads via Unified Indexing
Qiang Zhang
1
, Yongkun Li
1
, Patrick P. C. Lee
2
, Yinlong Xu
1
, Qiu Cui
3
, Liu Tang
3
1
University of Science and Technology of China
2
The Chinese University of Hong Kong
3
PingCAP
zhgqiang@mail.ustc.edu.cn, {ykli, ylxu}@ustc.edu.cn, pclee@cse.cuhk.edu.hk, {cuiqiu, tl}@pingcap.com
Abstract—Persistent key-value (KV) stores are mainly designed
based on the Log-Structured Merge-tree (LSM-tree), which suffer
from large read and write amplifications, especially when KV
stores grow in size. Existing design optimizations for LSM-
tree-based KV stores often make certain trade-offs and fail to
simultaneously improve both the read and write performance on
large KV stores without sacrificing scan performance. We design
UniKV, which unifies the key design ideas of hash indexing and
the LSM-tree in a single system. Specifically, UniKV leverages da-
ta locality to differentiate the indexing management of KV pairs.
It also develops multiple techniques to tackle the issues caused by
unifying the indexing techniques, so as to simultaneously improve
the performance in reads, writes, and scans. Experiments show
that UniKV significantly outperforms several state-of-the-art KV
stores (e.g., LevelDB, RocksDB, HyperLevelDB, and PebblesDB)
in overall throughput under read-write mixed workloads.
I. INTRODUCTION
Persistent key-value (KV) storage organizes data in the form
of key-value pairs and forms a critical storage paradigm in
modern data-intensive applications, such as web search [1], [2],
e-commerce [3], social networking [4], [5], data deduplication
[6], [7], and photo stores [8]. Real-world KV workloads become
more diversified and are geared toward a mixed nature. For
example, four out of five Facebook’s Memcached workloads
are read-write mixed [9]; the read-write ratio of low-latency
workloads at Yahoo! has shifted from 8:2 to 1:1 in recent years
[10]; Baidu’s cloud workload also shows a read-write ratio
of 2.78:1 [11]. Such mixed workloads challenge the indexing
structure design of KV stores to achieve high read and write
performance, while supporting scalable storage.
The Log-Structured Merge-tree (LSM-tree) [12] is a popular
indexing design in modern persistent KV stores, including local
KV stores (e.g., LevelDB [2] and RocksDB [5]) and large-
scale distributed KV stores (e.g., BigTable [1], HBase [13], and
Cassandra [14]). The LSM-tree supports three main features
that are attractive for KV-store designs: (i) efficient writes, as
newly written KV pairs are batched and written sequentially
to persistent storage; (ii) efficient range queries (scans), as KV
pairs are organized in multiple levels in a tree-like structure
and sorted by keys in each level; and (iii) scalability, as KV
pairs are mainly stored in persistent storage and can also be
easily distributed across multiple nodes.
The LSM-tree, unfortunately, incurs high compaction and
multi-level access costs. As KV pairs are continuously written,
the LSM-tree flushes them to persistent storage from lower
levels to higher levels. To keep the KV pairs at each level sorted,
the LSM-tree performs regular compactions for different levels
by reading, sorting, and writing back the sorted KV pairs.
Such regular compactions incur I/O amplification (which can
reach a factor of 50
×
[15]), and hence severely hurt the I/O
performance and the endurance of storage systems backed by
solid-state drives (SSDs) [16]. Furthermore, as the LSM-tree
grows in size and expands to more levels, each lookup may
traverse multiple levels, thereby incurring high latency.
Many academic and industrial projects have improved the
LSM-tree usage for KV stores to address different performance
aspects and storage architectures (see §V). Examples include
new LSM-tree organizations (e.g., a trie-like structure in LSM-
trie [17] or a fragmented LSM-tree in PebblesDB [18]) and KV
separation [15], [19]. However, the indexing structure is still
largely based on the LSM-tree, and such solutions often make
delicate design trade-offs. For example, LSM-trie [17] trades
the scan support for improved read and write performance;
PebblesDB [18] relaxes the fully sorted requirement in each
level and sacrifices read performance; KV separation [15], [19]
keeps only keys and metadata in the LSM-tree and stores
values separately, but incurs extra lookup overhead to keys
and values in separate storage [19]. Thus, the full performance
potentials of KV stores are still constrained by the inherent
multi-level-based LSM-tree design.
Our insight is that hash indexing is a well-studied indexing
technique that supports a fast lookup of a specific KV pair.
However, combining hash indexing and the LSM-tree is
challenging, as each of them makes a different design trade-
off. For example, hash indexing supports high read and write
performance, but does not support efficient scans. Also, a
hash index is kept in memory for high performance, but the
extra memory usage poses scalability concerns when more KV
pairs are stored. On the other hand, the LSM-tree supports
both efficient writes and scans as well as high scalability, but
suffers from high compaction and multi-level access overheads.
Thus, we pose the following question: Can we unify both hash
indexing and the LSM-tree to simultaneously address reads,
writes, and scans in a high-performance and scalable fashion?
We observe that data locality, which also commonly exists in
KV storage workloads [9], [20], [21], offers an opportunity to
address the above problem. To leverage data locality, we design
UniKV, which unifies hash indexing and the LSM-tree in a
single system. Specifically, UniKV adopts a layered architecture
to realize differentiated data indexing by building a light-weight
in-memory hash index for the recently written KV pairs that
评论