暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SIGMOD 2025_MaLT A Framework for Managing Large Transactions in OceanBase_OceanBase.pdf
145
13页
8次
2025-07-30
免费下载
MaLT: A Framework for Managing Large Transactions in
OceanBase
Chenguang Fang
OceanBase, Ant Group
Hangzhou, China
Chen Qian
OceanBase, Ant Group
Beijing, China
Qi Yang
OceanBase, Ant Group
Hangzhou, China
Zeyu Wang
OceanBase, Ant Group
Beijing, China
Zhenkun Yang
OceanBase, Ant Group
Beijing, China
Fanyu Kong
OceanBase, Ant Group
Beijing, China
Quanqing Xu
OceanBase, Ant Group
Hangzhou, China
Hui Cao
OceanBase, Ant Group
Hangzhou, China
Fusheng Han
OceanBase, Ant Group
Beijing, China
Chuanhui Yang
OceanBase, Ant Group
Beijing, China
OceanBaseLabs@service.oceanbase.com
Abstract
Large transactions challenge the designs of the modern relational
database systems, as they necessitate the management of uncommit-
ted changes. OceanBase, a distributed relational database system,
has implemented an LSM-tree based storage engine to achieve high
read and write performance. However, the support for large transac-
tions of existing LSM-tree based systems remains a problem under
the append-only principle of LSM-tree, making them suer from
memory constraints and low eciency.
In this paper, we present MaLT, a framework designed to ef-
ciently Manage Large Transactions within OceanBase system.
We introduce Transaction Context Table (TCT) and Transaction
Data Table (TDT) to manage transaction states in the LSM-tree
based storage engine. Based on them, we further devise an ecient
recovery mechanism to provide high availability of the databases
after unexpected system failures. Unlike existing LSM-tree based
RDBMSs that abstract LSM-trees as key-value stores, MaLT directly
implements transactions into the LSM-tree and leverages its unique
features. The backll (i.e., in-row version number update upon
commit) and undo operations for transactions are seamlessly in-
tegrated into the compaction stage of the LSM-tree. This enables
ecient commit and abort, and in the meantime helps avoid the
instant latency of recovering the system from uncommitted large
transactions. Moreover, MaLT also embeds transaction information
directly within the LSM-tree, facilitating various optimizations to
Chenguang Fang and Chen Qian contributed equally to this work.
Work done during internship at OceanBase.
Corresponding authors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany.
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-1564-8/25/6
https://doi.org/10.1145/3722212.3724442
improve both read and write performance. Finally, the experimen-
tal results demonstrate the eectiveness and the scalability of our
approach implemented in OceanBase.
CCS Concepts
Information systems
Database transaction processing;
Database recovery; Distributed database transactions.
Keywords
RDBMS; Large Transactions; LSM-tree
ACM Reference Format:
Chenguang Fang, Chen Qian, Qi Yang, Zeyu Wang, Zhenkun Yang, Fanyu
Kong, Quanqing Xu, Hui Cao, Fusheng Han, and Chuanhui Yang. 2025.
MaLT: A Framework for Managing Large Transactions in OceanBase. In
Companion of the 2025 International Conference on Management of Data
(SIGMOD-Companion ’25), June 22–27, 2025, Berlin, Germany. ACM, New
York, NY, USA, 13 pages. https://doi.org/10.1145/3722212.3724442
1 Introduction
In recent years, log-structured merge trees (LSM-trees) [
23
] are in-
creasingly adopted as storage engines in modern relational database
management systems (RDBMSs). LSM-trees follow an append-only
principle to serve out-of-place updates. The ingested data are rst
buered in memory as MemTables and then ushed to disk as im-
mutable SSTables. Such structure oers the capability of managing
fast writes and also facilitates ecient space usage. OceanBase,
a distributed relational database system, employs LSM-tree as its
storage engine. By implementing LSM-tree with several optimiza-
tion strategies such as a daily incremental major compaction [
32
],
OceanBase achieves high read and write performance.
1.1 Challenges of Large Transactions
Many modern applications require complex transactions that in-
volve signicant data manipulation over multiple tables and records.
While RDBMSs based on LSM-trees oer various benets, they face
distinct challenges when managing large transactions. These trans-
actions typically involve larger-than-RAM transactions, which lead
to out-of-memory workloads when handling uncommitted changes.
BBAAD9C20180234D78A0072836F0BB2062B9B20A18E7DBB0A7D9813CB1462B79BB44B438015D7B0A22192208984674EBE7E921BAE1D06BC11BBFC27F7A1E39D6241DD7AD5324C98764CB2F77635743E76F4CE6C174B402A3B80CB19F4EF06C08D7B62291FE3
SIGMOD-Companion ’25, June 22–27, 2025, Berlin, Germany. Chenguang Fang et al.
Additionally, long-running transactions occupy fragmented resources
for extended periods. This also leads to potential issues of releasing
the resource and hinders the eciency of database recovery pro-
cesses. Several modern RDBMSs (e.g., TiDB [
18
] and CockroachDB
[
28
]) also employ LSM-tree as their storage engines. However, this
design faces several specic challenges:
Transaction Size Limitation. While existing LSM-tree based
databases support moderately large transactions, they still face
limitations due to either memory or log size constraints. Some LSM-
tree based databases support transactions following the Percolator
model [
24
], which relies on memory to manage all uncommitted
data. Consequently, the transaction size in these systems cannot
exceed the available memory. On the other hand, databases like
RocksDB [
2
] support transactions that are larger than RAM by
temporarily writing uncommitted changes to disk. Unfortunately,
their implementation requires reserving all the write-ahead log
(WAL) for a transaction to track the transaction states before it is
committed. This results in large logs, thus restricting the transaction
size based on the log capacity. Therefore, it is challenging to support
arbitrarily large transactions in LSM-tree based databases.
Inecient Recovery. Large transactions also incur high recov-
ery costs [
13
]. When the system encounters an exceptional scenario,
the recovery process comprises redo and undo phases [
22
]. How-
ever, to recover these transactions, it takes a long time to undo all
the operations as well as keeping locks during the process. While
existing methods (e.g., [
13
,
19
]) propose to eliminate the issue of
large transaction recovery in B
+
-tree based databases, they are not
applicable in LSM-tree based systems, since LSM-tree prevents in-
place updates for SSTables. Moreover, both redo and undo phases
rely on the states of the active and terminated transactions, i.e., it
is essential to manage the persistence of the transaction states.
Limited Utilization of LSM-tree Features. Existing systems
often abstract the LSM-tree storage engine as a key-value store,
treating it as a black box. Such design limits the potential for lever-
aging LSM-tree features to better support large transactions. To
support large transactions, the most common implementation is
to write the commit version back (i.e., backll) or rollback state
into the LSM-tree as a key-value pair upon commit or abort [
24
] to
invalidate the old states. This is easy in B
+
-tree based database by
rewriting the corresponding data pages. Unfortunately, owing to
the append-only nature of LSM-tree, the SSTables in disk are im-
mutable and thus the rewrites incur additional I/Os the same as the
size of the transaction. In addition to the aforementioned overhead,
during commit and rollback, such design also fails to optimize reads
and writes based on such black-box implementation.
1.2 Solutions
To address the aforementioned challenges, we devise MaLT for
Managing Large Transactions in OceanBase. It highlights the fol-
lowing features.
Larger-than-RAM Transactions Support. To facilitate the
execution of larger-than-RAM transactions in LSM-tree based data-
base, it is crucial to write uncommitted changes to disk, i.e., ap-
plying the steal policy [
16
,
22
]. To eliminate the dependency on
reserving the entire WAL for a transaction, MaLT introduces an
external structure to store the transaction states, including active,
committed and aborted. Therefore, we devise Transaction Context
Table (TCT) and Transaction Data Table (TDT) tailored for the
LSM-tree architecture. Specically, TCT is responsible for record-
ing in-memory active transaction context. TDT records the states
of committed/aborted transactions. By combining TCT and TDT,
MaLT eciently updates the transaction states for uncommitted
changes with TCT and TDT upon commit. Hence, MaLT eectively
supports Larger-than-RAM transactions without any limitations.
Eective Persistence of Transaction States and Ecient
Recovery. We further leverage TCT and TDT for recovery and
devise dedicated persistence strategies for them. In particular, MaLT
ensures full preservation of TCT on disk, which enables the recovery
of the active transactions during redo phase. TDT, instead, persists
the terminated transaction states following the similar structure of
LSM-tree for more ecient storage. MaLT then skips the undo phase
by utilizing TDT. This eliminates the immediate latency typically
associated with recovery processes and also enables constant-time
recovery regardless of transaction size.
Ecient Transaction Executions with Optimizations in
LSM-tree. Unlike traditional RDBMSs that often abstract storage
engines as key-value stores, MaLT embeds transaction information
directly within the LSM-tree storage engine. This enables a range
of optimizations. First, as the TCT and TDT frameworks manage
transaction states, there is no immediate need for backll or rollback
operations upon commit or abort. This allows MaLT to integrate
the backll and rollback of transaction states into the LSM-tree com-
paction stage seamlessly. Hence, MaLT can perform highly ecient
transaction commit and rollback both in constant time. Addition-
ally, the MemTable and SSTable in MaLT store transaction-specic
metadata allowing for ecient data ltering and retrieval. This
optimization signicantly enhances read and write performance.
Overall, MaLT achieves very ecient commit, abort, and execu-
tion operations through these optimization strategies based on the
LSM-tree architecture.
1.3 Use Case
The aforementioned challenges in §1.1 highlight a critical gap be-
tween LSM-tree storage engines and the demands of modern en-
terprise workloads regarding large transactions. This subsection
reports a use case from one of our customers who implemented a -
nancial platform that required robust support for large transactions
in their database.
The nancial platform exhibited specic requirements for the
database system: It experiences a business peak in the morning, with
backups scheduled in the afternoon. Subsequently, while there is no
business data activity, batch processing involving large transactions
takes place. In the evening, the platform handles large transactions
involving 2.5 million entries per commit for data import, during
which DDL synchronization is also required. The key requirements
from the platform include ensuring peak performance during busi-
ness hours and stability during large transaction batch processing.
The previous versions of OceanBase often resulted in memory
overloads or log disk failures that necessitated manual recovery
interventions. Even after achieving support for large transactions in
earlier versions, the commit and rollback speeds are still concerning
for transaction executions. Therefore, we implemented MaLT in
BBAAD9C20180234D78A0072836F0BB2062B9B20A18E7DBB0A7D9813CB1462B79BB44B438015D7B0A22192208984674EBE7E921BAE1D06BC11BBFC27F7A1E39D6241DD7AD5324C98764CB2F77635743E76F4CE6C174B402A3B80CB19F4EF06C08D7B62291FE3
of 13
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜