SIGMOD 2025_OLTP Engines on Modern Storage Architectures_OceanBase.pdf

迹部景吾

117

8页

2次

2025-07-30

免费下载

OLTP Engines on Modern Storage Architectures

Daokun Hu

Ant Group

Hangzhou, China

hudaokun.hdk@antgroup.com

Quanqing Xu

OceanBase, Ant Group

Hangzhou, China

xuquanqing.xqq@oceanbase.com

Chuanghui Yang

OceanBase, Ant Group

Beijing, China

rizhao.ych@oceanbase.com

Abstract

Online transaction processing (OLTP) engines are crucial compo-

nents of database systems, facing signicant challenges due to the

rapid growth of data on the Internet. Memory-oriented OLTP en-

gines struggle with the cost and capacity limitations of hosting

large data volumes, while disk-oriented engines suer performance

degradation when their working set size exceeds available memory.

Limitations in DRAM density and disk latency hinder scalability

and service quality, complicating large-scale data management and

transaction execution.

In recent years, advancements in storage architecture, such as

persistent memory, NVMe SSDs, and CXL, help alleviate these

memory and I/O pressures by bridging the performance gap be-

tween DRAM and traditional block storage devices or eciently

expanding memory pools. These technologies are used to enhance

and accelerate OLTP engines, with emerging storage hardware and

protocols oering improved scalability and remote access.

Despite these innovations, new challenges arise, requiring ad-

vanced data management strategies in dynamic environments. This

tutorial provides an overview of modern OLTP engines leveraging

cutting-edge storage solutions, exploring storage hierarchies, pro-

tocols, and programming models that oer insights for researchers

and industry professionals. Additionally, it highlights the challenges

and opportunities presented by emerging storage architectures for

OLTP engines.

CCS Concepts

• Hardware

→

Emerging interfaces; • Information systems

→

Storage architectures; Database management system engines.

Keywords

OLTP, Modern Storage, CXL, Persistent Memory

ACM Reference Format:

Daokun Hu, Quanqing Xu, and Chuanghui Yang. 2025. OLTP Engines on

Modern Storage Architectures. In Companion of the 2025 International Con-

ference on Management of Data (SIGMOD-Companion ’25), June 22–27, 2025,

Berlin, Germany. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/

3722212.3725633

1 Introduction

Online transaction processing (OLTP) engines are essential and

pivotal components within database systems. However, with the

growth of data on the Internet, OLTP engines are encountering

This work is licensed under a Creative Commons Attribution 4.0 International License.

SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany

ACM ISBN 979-8-4007-1564-8/2025/06

https://doi.org/10.1145/3722212.3725633

unprecedented challenges on memory and I/O. Memory-oriented

OLTP engines are constrained by the disproportionate costs and ca-

pacities associated with hosting extensive data volumes [

]. Disk-

oriented OLTP engines, utilizing traditional block storage devices

such as SSDs or HDDs, undergo considerable performance degra-

dation when the working set size exceeds the available memory

capacities, despite the assistance of buer pools in DRAM [

When dealing with large-scale datasets, the low density of DRAM,

combined with the limited bandwidth and high latency of disk stor-

age, hinders OLTP engines from delivering high-quality services

and achieving high scalability.

In recent years, numerous technologies have been deployed to al-

leviate memory and I/O pressure. Modern storage solutions, such as

persistent memory, DRAM-based non-volatile DIMMs (NVDIMMs),

and non-volatile memory express solid-state drives (NVMe SSDs)

with new PCIe, bridge the gap between DRAM and slower SATA

SSD/HDD block devices regarding capacity and performance, and

are extensively used to accelerate OLTP engines. High-speed inter-

connect protocols, such as Remote Direct Memory Access (RDMA)

and Compute Express Link (CXL), can combine the memory of mul-

tiple machines. While maintaining strong performance, they can

reduce overall memory costs by dynamically allocating resources.

Emerging storage hardware introduces new storage tiers that en-

hance local performance, while new protocols enable faster remote

access, improving scale-out capabilities. Despite these innovations

oering enhanced capabilities and exibility, they also present

new challenges for modern OLTP engines, necessitating advanced

strategies to eectively manage data in increasingly dynamic envi-

ronments. This tutorial aims to provide a comprehensive overview

of OLTP engines based on modern storage architectures. In the

discussion of modern storage architectures, we address various

storage hierarchies, along with RDMA and CXL protocols for dis-

tributed environments. We also explore the programming models

relevant to modern storage, oering insights for both researchers

and industry professionals. Finally, we present challenges and po-

tential opportunities that OLTP engines encounter with emerging

storage technologies.

Target Audience. The target audience for this tutorial includes

students, researchers, and developers interested in achieving high-

performance OLTP engines and databases through modern hard-

ware and technologies. The tutorial provides basic and background

introductions to current and future storage architectures. A basic

understanding of OLTP engines,databases and storage is helpful

for participants.

Tutorial O verview (180 min in total).

(1) Introduction and motivation (10 min).

(2) Storage and interconnect technologies (25 min).

(3) Storage architectures (25 min).

(4) OLTP engines on modern storage hierarchy (55 min).

BBAAD9C20180234D78A0072836F0BBB092B9B20A1C485BB0A6D9833CB14E2B59EB4BB43801567B0722392208984656EB1DE921EAE1D09B311BBFC257744E39D9241D02AD4523C98764FD2FE763B74345586CE5C7710452A0D89C419F15DDAC08D6B62097FE3

SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany Daokun Hu, anqing Xu, and Chuanghui Yang

(5) OLTP engines on modern distributed storage (55 min).

(6)

Discussion on future challenges and opportunities (10 min).

Related Tutorials. The tutorial “Data management in non-

volatile memory” [

] presented at the SIGMOD 2015 conference,

provided insights into how persistent memory can be seamlessly

integrated into data management systems. In 2017, the tutorial

“How to Build a Non-Volatile Memory Database Management Sys-

tem” [

] extended persistent memory to the entire internal database

management system stack. More recently, from 2022 to 2023, tutori-

als [

] focused on recovery strategies, disaggregated

databases, cloud databases and databases on modern networks.

Unlike previous work, this tutorial specically focuses on OLTP en-

gines on modern storage architectures, emphasizing architectures

based on various storage hardware and protocols such as RDMA

and CXL.

2 Background

With advances in storage technologies and interconnect protocols,

numerous cutting-edge products, such as NVMe SSDs with PCIe 5.0,

Intel Optane DCPMM, and CXL, are poised to supersede traditional

storage systems. These innovations oer remarkable improvements

in speed, eciency, and data management capabilities, catering to

the growing demands of modern computing environments. In this

section, we explore modern storage and interconnect technologies

to better understand their impact on performance and scalability,

which will be further discussed in Sections 3, 5 and 4.

2.1 Persistent Memory

Persistent memory (PMem) combines memory-like speed with non-

volatility, ensuring data persistence even during power loss. It is

distinguished by high performance, persistence, byte-addressability

and high density, bridging the gap between DRAM and block de-

vices in terms of both capacity and performance.

In 2019, Intel introduced the Optane DCPMM series 100, based on

3D XPoint technology, as the rst commercially available persistent

memory, becoming a focal point for research. The DCPMM oers a

per-DIMM capacity ranging from 128 to 512 GB, with write/read

latencies in the tens or hundreds of nanoseconds. It shares the mem-

ory bus with DRAM and supports

load/store

instructions. After

ushing data out of CPU cache, once data reaches the asynchro-

nous DRAM refresh (ADR) region, it is guaranteed to be durable.

For persistence and consistency, programmers must explicitly ush

the CPU cache to ensure data is persisted and use memory fence

instructions to prevent the CPU from reordering store operations.

The second-generation DCPMM, which supports extended asyn-

chronous DRAM refresh (eADR), expands the persistence domain

to include the CPU cache. This extension makes the CPU cache

a transient persistence domain by ensuring that data buered in

the CPU cache is ushed to persistent memory during a power

outage. Despite this advancement, memory fence instructions are

still necessary to maintain data consistency.

The emergence of persistent memory has introduced a new stor-

age architecture, presenting both opportunities and challenges for

OLTP systems with data persistence requirements. The study [

]

employs various database engines and benchmarks to evaluate and

compare performance impacts of PMem. It highlights the need for

ne-tuning and optimization redesign to fully leverage the capabil-

ities of PMem.

Fro programmers, the programming model of PMem (PMDK [

])

provides a transactional object store, including memory allocation,

transactions, and general facilities for persistent memory program-

ming. It also provides low-level persistent memory support such as

data copy and persistence, etc.

2.2 NVMe SSD

Non-volatile memory express solid-state drives (NVMe SSD) feature

block-addressability and deliver high performance. Recent advance-

ments have made SSD both faster and more cost-eective, with

the NVMe/PCIe interface enhancing interconnect speeds from 4

GB/s (PCIe 3.0) to 16 GB/s (PCIe 5.0) [

]. An array of PCIe 5.0

NVMe SSDs can achieve more than 100 GB/s read throughput [

Modern commodity servers, equipped with up to 128 PCIe lanes per

socket, can eortlessly host 8 or more SSD at full bandwidth [

As a result, a server can achieve tens of millions of I/O operations

per second [

]. However, the rise of high-throughput NVMe SSD

also challenges current database engines: pure in-memory engines

are costly and cannot leverage cheaper SSD, while out-of-memory

systems cannot fully utilize SSD capabilities originally designed for

SATA disks [20, 23, 32].

For programmers, there are three mainstream programming mod-

els of NVMe SSDs mechanism including libaio [

], io_uring [

]

and SPDK [

]. SPDK is a user space I/O library that bypasses the

kernel, enabling direct access to NVMe SSDs with zero-copy and

high-performance features. io_uring is a Linux API that utilizes

shared-memory, lock-free queues between the kernel and applica-

tion. It supports dierent polling mechanisms, allowing for reduced

syscall and interrupt overhead and enhanced asynchronous I/O

performance. libaio is an asynchronous I/O library. It oers an

interface for applications to issue asynchronous I/O requests.

2.3 RDMA and CXL

Remote Direct Memory Access (RDMA) is a technology that en-

ables nodes within a cluster to directly access each other’s memory

regions, bypassing the operating system kernel. This eliminates

traditional TCP/IP protocol stack overhead, such as unnecessary

data copying and context switching between user and kernel spaces.

RDMA relies on Remote Network Interface Controllers (RNICs) for

direct memory access within network adapters, facilitating data

transfers between nodes’ memory. In fast datacenter networks, a

basic RDMA operation takes approximately 2 microseconds [

and the bandwidth can reach tens of gigabytes per second. RDMA

has been widely used in datacenter.

For RDMA programming models, ibverbs [

] is a key component

in InniBand technology, providing a high-speed communication

interface. It enables ecient data transfer and low-latency com-

munication between nodes. Libfabric [

] is a more generalized

fabric API that provides a unied abstraction for high-performance

network devices.

Compute Express Link (CXL) is a promising interconnect stan-

dard [

], which enables cacheable

load/store

accesses to pooled

memory. CXL consists of three sub-protocols:

CXL.IO

CXL.Cache

and

CXL.Mem

CXL.IO

is an enhanced version of PCIe and forms the