
These issues limit the applicability of latency-optimized database
engines. Future engines should further address other sources of
latency, and more importantly, do so while preserving the benets
of existing latency-hiding techniques, such as software prefetching.
1.2
MosaicDB: Latency Hiding at the Next Level
This paper presents MosaicDB, a multi-versioned, latency-optimized
OLTP engine that hides latency from multiple sources, including
memory, I/O, synchronization and scheduling as identied earlier.
To reach this goal, MosaicDB consists of a set of techniques that
could also be applied separately in existing systems.
MosaicDB bases on the coroutine-to-transaction paradigm [
21
]
to hide memory access latency. On top of that, we observe that
the key to eciently hiding I/O latency without hurting the per-
formance of memory-resident transactions is carefully scheduling
transactions such that the CPU can be kept busy while I/O is in
progress. To this end, MosaicDB proposes a pipelined scheduling
policy for coroutine-oriented database engines. The basic ideas are
(1) keeping admitting new requests in a pipelined fashion such that
each worker thread always works with a full batch of requests, and
(2) admitting more I/O operations to the system only when there
is enough I/O capacity (measured by bandwidth consumption or
IOPS). This way, once the storage devices are saturated, MosaicDB
only accepts memory-resident requests, which can benet from
software prefetching. By carefully examining alternatives in later
sections, we show how these seemingly simple ideas turned out to
be eective and became our eventual design decision.
To avoid latency caused by synchronization primitives and OS
scheduling, MosaicDB leverages the coroutine-to-transaction para-
digm to regulate contention and eliminate the need for background
threads. Specically, each worker thread can work on multiple
transactions concurrently, but only one transaction per thread will
be active at any given time. This avoids oversubscribing the system
by limiting the degree of multiprogramming to the amount of hard-
ware parallelism (e.g., number of hardware threads). Consequently,
the OS scheduler will largely be kept out of the critical path of the
OLTP engine because context switching only happens in the user
space as transactions are suspended and resumed by the worker
threads. Using this architecture, MosaicDB also further removes
the need for dedicated background threads (e.g., log ushers) which
were required by pipelined commit [
26
] that is necessary to achieve
high transaction throughput without sacricing durability: cleanup
work such as log ushes can be done using asynchronous I/O upon
transaction commit, which will then suspend and be resumed and
fully committed only when the I/O request is nished.
We implemented MosaicDB on top of CoroBase [
21
], a latency-
optimized OLTP engine that hides memory latency using software
prefetching. Compared to baselines, on a 48-core server, MosaicDB
maintains high throughput for memory-resident transactions, while
allowing additional storage-resident transactions to fully lever-
age the storage device. Overall, MosaicDB achieves up to 33
×
higher throughput for larger-than-memory workloads; with given
CPU cores, MosaicDB is free of oversubscription and outperforms
CoroBase by 1.7
×
under TPC-C; MosaicDB has better scalability
under high contention workloads, with up to 18% less contention
and 2.38× throughput compared to state-of-the-art.
Although MosaicDB is implemented and evaluated on top of
CoroBase, the techniques can be separately applied in other systems.
For example, contention regulation can be adopted by systems that
use event-driven connection handling where the total number of
worker threads will never exceed the number of hardware threads.
We leave it as future work to explore how MosaicDB techniques
can be applied in other systems.
1.3 Contributions
This paper makes four contributions.
1
We quantify the impact of
various sources of latency identied in memory-optimized OLTP
engines, beyond memory access latency which received most atten-
tion in the past.
2
We propose design principles that preserve the
benets of software prefetching to hide memory latency and hide
storage access latency at the same time.
3
In addition to memory
and storage I/O, we show how software database engine architec-
ture could be modied to avoid the latency impact of synchroniza-
tion and OS scheduling.
4
We build and evaluate MosaicDB on
top of an existing latency-optimized OLTP engine to showcase
the eectiveness of MosaicDB techniques in practice. MosaicDB is
open-source at hps://github.com/sfu-dis/mosaicdb.
2 BACKGROUND
We begin by clarifying the type of systems that our work is based
upon and assumptions. We then elaborate the main sources of
latencies that MosaicDB attempts to hide, followed by existing
latency-optimized designs that motivated our work.
2.1 System Architectures and Assumptions
We target memory-optimized OLTP engines that both (1) leverage
large DRAM when data ts in memory and (2) support larger-than-
memory databases when the working set goes beyond memory.
Larger-Than-Memory Database Engines. There are mainly
two approaches to realizing this. One is to craft better buer pool
designs [
37
,
46
] which use techniques like pointer swizzling [
18
,
28
]
and low-overhead page eviction algorithms [
37
,
58
] to approach in-
memory performance when data ts in DRAM, while otherwise pro-
viding graceful degradation and fully utilizing storage bandwidth.
The other approach employs a “hot-cold” architecture [
14
,
31
] that
does not use a buer pool, and separates hot and cold data whose
primary homes are respectively main memory and secondary stor-
age (e.g., SSDs). In essence, a hot-cold database engine consists
of a “hot store” that is memory-resident (although persistence is
still guaranteed) and an add-on “cold store” in storage. A transac-
tion then could access data from both stores. However, note that
both “stores” use the same mechanisms for such functionality as
concurrency control, indexing and checkpointing, inside a single
database engine without requiring cross-engine capabilities [
64
].
In this paper, we focus on the hot-cold architecture and leave it as
future work to explore the buer pool based approach.
Hot-Cold Storage Engines. Figure 2 shows the design of ER-
MIA [
30
], a typical hot-cold system that employs multi-versioned
concurrency, in-memory indexing and indirection arrays [
50
] to
support in-memory data (hot store) and storage-resident data (cold
store). Many other systems [
13
,
14
,
39
,
40
] follow similar designs.
An update to the database will append a new in-memory record
578
文档被以下合辑收录
评论