暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SIGMOD 2025_OLTP Engines on Modern Storage Architectures_OceanBase.pdf
117
8页
2次
2025-07-30
免费下载
OLTP Engines on Modern Storage Architectures
Daokun Hu
Ant Group
Hangzhou, China
hudaokun.hdk@antgroup.com
Quanqing Xu
OceanBase, Ant Group
Hangzhou, China
xuquanqing.xqq@oceanbase.com
Chuanghui Yang
OceanBase, Ant Group
Beijing, China
rizhao.ych@oceanbase.com
Abstract
Online transaction processing (OLTP) engines are crucial compo-
nents of database systems, facing signicant challenges due to the
rapid growth of data on the Internet. Memory-oriented OLTP en-
gines struggle with the cost and capacity limitations of hosting
large data volumes, while disk-oriented engines suer performance
degradation when their working set size exceeds available memory.
Limitations in DRAM density and disk latency hinder scalability
and service quality, complicating large-scale data management and
transaction execution.
In recent years, advancements in storage architecture, such as
persistent memory, NVMe SSDs, and CXL, help alleviate these
memory and I/O pressures by bridging the performance gap be-
tween DRAM and traditional block storage devices or eciently
expanding memory pools. These technologies are used to enhance
and accelerate OLTP engines, with emerging storage hardware and
protocols oering improved scalability and remote access.
Despite these innovations, new challenges arise, requiring ad-
vanced data management strategies in dynamic environments. This
tutorial provides an overview of modern OLTP engines leveraging
cutting-edge storage solutions, exploring storage hierarchies, pro-
tocols, and programming models that oer insights for researchers
and industry professionals. Additionally, it highlights the challenges
and opportunities presented by emerging storage architectures for
OLTP engines.
CCS Concepts
Hardware
Emerging interfaces; Information systems
Storage architectures; Database management system engines.
Keywords
OLTP, Modern Storage, CXL, Persistent Memory
ACM Reference Format:
Daokun Hu, Quanqing Xu, and Chuanghui Yang. 2025. OLTP Engines on
Modern Storage Architectures. In Companion of the 2025 International Con-
ference on Management of Data (SIGMOD-Companion ’25), June 22–27, 2025,
Berlin, Germany. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/
3722212.3725633
1 Introduction
Online transaction processing (OLTP) engines are essential and
pivotal components within database systems. However, with the
growth of data on the Internet, OLTP engines are encountering
This work is licensed under a Creative Commons Attribution 4.0 International License.
SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany
© 2025 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-1564-8/2025/06
https://doi.org/10.1145/3722212.3725633
unprecedented challenges on memory and I/O. Memory-oriented
OLTP engines are constrained by the disproportionate costs and ca-
pacities associated with hosting extensive data volumes [
60
]. Disk-
oriented OLTP engines, utilizing traditional block storage devices
such as SSDs or HDDs, undergo considerable performance degra-
dation when the working set size exceeds the available memory
capacities, despite the assistance of buer pools in DRAM [
36
,
60
].
When dealing with large-scale datasets, the low density of DRAM,
combined with the limited bandwidth and high latency of disk stor-
age, hinders OLTP engines from delivering high-quality services
and achieving high scalability.
In recent years, numerous technologies have been deployed to al-
leviate memory and I/O pressure. Modern storage solutions, such as
persistent memory, DRAM-based non-volatile DIMMs (NVDIMMs),
and non-volatile memory express solid-state drives (NVMe SSDs)
with new PCIe, bridge the gap between DRAM and slower SATA
SSD/HDD block devices regarding capacity and performance, and
are extensively used to accelerate OLTP engines. High-speed inter-
connect protocols, such as Remote Direct Memory Access (RDMA)
and Compute Express Link (CXL), can combine the memory of mul-
tiple machines. While maintaining strong performance, they can
reduce overall memory costs by dynamically allocating resources.
Emerging storage hardware introduces new storage tiers that en-
hance local performance, while new protocols enable faster remote
access, improving scale-out capabilities. Despite these innovations
oering enhanced capabilities and exibility, they also present
new challenges for modern OLTP engines, necessitating advanced
strategies to eectively manage data in increasingly dynamic envi-
ronments. This tutorial aims to provide a comprehensive overview
of OLTP engines based on modern storage architectures. In the
discussion of modern storage architectures, we address various
storage hierarchies, along with RDMA and CXL protocols for dis-
tributed environments. We also explore the programming models
relevant to modern storage, oering insights for both researchers
and industry professionals. Finally, we present challenges and po-
tential opportunities that OLTP engines encounter with emerging
storage technologies.
Target Audience. The target audience for this tutorial includes
students, researchers, and developers interested in achieving high-
performance OLTP engines and databases through modern hard-
ware and technologies. The tutorial provides basic and background
introductions to current and future storage architectures. A basic
understanding of OLTP engines,databases and storage is helpful
for participants.
Tutorial O verview (180 min in total).
(1) Introduction and motivation (10 min).
(2) Storage and interconnect technologies (25 min).
(3) Storage architectures (25 min).
(4) OLTP engines on modern storage hierarchy (55 min).
BBAAD9C20180234D78A0072836F0BBB092B9B20A1C485BB0A6D9833CB14E2B59EB4BB43801567B0722392208984656EB1DE921EAE1D09B311BBFC257744E39D9241D02AD4523C98764FD2FE763B74345586CE5C7710452A0D89C419F15DDAC08D6B62097FE3
SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany Daokun Hu, anqing Xu, and Chuanghui Yang
(5) OLTP engines on modern distributed storage (55 min).
(6)
Discussion on future challenges and opportunities (10 min).
Related Tutorials. The tutorial “Data management in non-
volatile memory” [
47
] presented at the SIGMOD 2015 conference,
provided insights into how persistent memory can be seamlessly
integrated into data management systems. In 2017, the tutorial
“How to Build a Non-Volatile Memory Database Management Sys-
tem” [
9
] extended persistent memory to the entire internal database
management system stack. More recently, from 2022 to 2023, tutori-
als [
27
,
37
,
38
,
40
,
48
] focused on recovery strategies, disaggregated
databases, cloud databases and databases on modern networks.
Unlike previous work, this tutorial specically focuses on OLTP en-
gines on modern storage architectures, emphasizing architectures
based on various storage hardware and protocols such as RDMA
and CXL.
2 Background
With advances in storage technologies and interconnect protocols,
numerous cutting-edge products, such as NVMe SSDs with PCIe 5.0,
Intel Optane DCPMM, and CXL, are poised to supersede traditional
storage systems. These innovations oer remarkable improvements
in speed, eciency, and data management capabilities, catering to
the growing demands of modern computing environments. In this
section, we explore modern storage and interconnect technologies
to better understand their impact on performance and scalability,
which will be further discussed in Sections 3, 5 and 4.
2.1 Persistent Memory
Persistent memory (PMem) combines memory-like speed with non-
volatility, ensuring data persistence even during power loss. It is
distinguished by high performance, persistence, byte-addressability
and high density, bridging the gap between DRAM and block de-
vices in terms of both capacity and performance.
In 2019, Intel introduced the Optane DCPMM series 100, based on
3D XPoint technology, as the rst commercially available persistent
memory, becoming a focal point for research. The DCPMM oers a
per-DIMM capacity ranging from 128 to 512 GB, with write/read
latencies in the tens or hundreds of nanoseconds. It shares the mem-
ory bus with DRAM and supports
load/store
instructions. After
ushing data out of CPU cache, once data reaches the asynchro-
nous DRAM refresh (ADR) region, it is guaranteed to be durable.
For persistence and consistency, programmers must explicitly ush
the CPU cache to ensure data is persisted and use memory fence
instructions to prevent the CPU from reordering store operations.
The second-generation DCPMM, which supports extended asyn-
chronous DRAM refresh (eADR), expands the persistence domain
to include the CPU cache. This extension makes the CPU cache
a transient persistence domain by ensuring that data buered in
the CPU cache is ushed to persistent memory during a power
outage. Despite this advancement, memory fence instructions are
still necessary to maintain data consistency.
The emergence of persistent memory has introduced a new stor-
age architecture, presenting both opportunities and challenges for
OLTP systems with data persistence requirements. The study [
31
]
employs various database engines and benchmarks to evaluate and
compare performance impacts of PMem. It highlights the need for
ne-tuning and optimization redesign to fully leverage the capabil-
ities of PMem.
Fro programmers, the programming model of PMem (PMDK [
6
])
provides a transactional object store, including memory allocation,
transactions, and general facilities for persistent memory program-
ming. It also provides low-level persistent memory support such as
data copy and persistence, etc.
2.2 NVMe SSD
Non-volatile memory express solid-state drives (NVMe SSD) feature
block-addressability and deliver high performance. Recent advance-
ments have made SSD both faster and more cost-eective, with
the NVMe/PCIe interface enhancing interconnect speeds from 4
GB/s (PCIe 3.0) to 16 GB/s (PCIe 5.0) [
35
]. An array of PCIe 5.0
NVMe SSDs can achieve more than 100 GB/s read throughput [
32
].
Modern commodity servers, equipped with up to 128 PCIe lanes per
socket, can eortlessly host 8 or more SSD at full bandwidth [
20
].
As a result, a server can achieve tens of millions of I/O operations
per second [
35
]. However, the rise of high-throughput NVMe SSD
also challenges current database engines: pure in-memory engines
are costly and cannot leverage cheaper SSD, while out-of-memory
systems cannot fully utilize SSD capabilities originally designed for
SATA disks [20, 23, 32].
For programmers, there are three mainstream programming mod-
els of NVMe SSDs mechanism including libaio [
3
], io_uring [
2
]
and SPDK [
56
]. SPDK is a user space I/O library that bypasses the
kernel, enabling direct access to NVMe SSDs with zero-copy and
high-performance features. io_uring is a Linux API that utilizes
shared-memory, lock-free queues between the kernel and applica-
tion. It supports dierent polling mechanisms, allowing for reduced
syscall and interrupt overhead and enhanced asynchronous I/O
performance. libaio is an asynchronous I/O library. It oers an
interface for applications to issue asynchronous I/O requests.
2.3 RDMA and CXL
Remote Direct Memory Access (RDMA) is a technology that en-
ables nodes within a cluster to directly access each other’s memory
regions, bypassing the operating system kernel. This eliminates
traditional TCP/IP protocol stack overhead, such as unnecessary
data copying and context switching between user and kernel spaces.
RDMA relies on Remote Network Interface Controllers (RNICs) for
direct memory access within network adapters, facilitating data
transfers between nodes’ memory. In fast datacenter networks, a
basic RDMA operation takes approximately 2 microseconds [
11
],
and the bandwidth can reach tens of gigabytes per second. RDMA
has been widely used in datacenter.
For RDMA programming models, ibverbs [
5
] is a key component
in InniBand technology, providing a high-speed communication
interface. It enables ecient data transfer and low-latency com-
munication between nodes. Libfabric [
4
] is a more generalized
fabric API that provides a unied abstraction for high-performance
network devices.
Compute Express Link (CXL) is a promising interconnect stan-
dard [
1
], which enables cacheable
load/store
accesses to pooled
memory. CXL consists of three sub-protocols:
CXL.IO
,
CXL.Cache
,
and
CXL.Mem
.
CXL.IO
is an enhanced version of PCIe and forms the
BBAAD9C20180234D78A0072836F0BBB092B9B20A1C485BB0A6D9833CB14E2B59EB4BB43801567B0722392208984656EB1DE921EAE1D09B311BBFC257744E39D9241D02AD4523C98764FD2FE763B74345586CE5C7710452A0D89C419F15DDAC08D6B62097FE3
of 8
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜