暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
【阿里云2024SIGMOD】Rethink Query Optimization in HTAP Databases.pdf
82
27页
2次
2025-04-15
免费下载
256
Rethink ery Optimization in HTAP Databases
HAOZE SONG
, The University of Hong Kong, Hong Kong SAR
WENCHAO ZHOU, Alibaba Group, China
FEIFEI LI, Alibaba Group, China
XIANG PENG, Alibaba Group, China
HEMING CUI, The University of Hong Kong, Hong Kong SAR
The advent of data-intensive applications has fueled the evolution of hybrid transactional and analytical
processing (HTAP). To support mixed workloads, distributed HTAP databases typically maintain two data
copies that are specially tailored for data freshness and performance isolation. In particular, a copy in a
row-oriented format is well-suited for OLTP workloads, and a second copy in a column-oriented format is
optimized for OLAP workloads. Such a hybrid design opens up a new design space for query optimization:
plans can be optimized over dierent data formats and can be executed over isolated resources, which we
term hybrid plans. In this paper, we demonstrate that hybrid plans can largely benet query execution (e.g.,
up to 11
×
speedups in our evaluation). However, we also found these benets will be potentially at the cost
of sacricing data freshness or performance isolation since traditional optimizers may not precisely model
and schedule the execution of hybrid plans on real-time updated HTAP databases. Therefore, we propose
Metis, an HTAP-aware optimizer. We show, both theoretically and experimentally, that using the proposed
optimizations, a system can largely benet from hybrid plans while preserving isolated performance for OLTP
and OLAP, and these optimizations are robust to the changes in workloads.
CCS Concepts: Information systems
Data access methods; Query optimization; Data layout;
Computer systems organization Real-time system architecture.
Additional Key Words and Phrases: Hybrid Transactional and Analytical Processing (HTAP) Databases,
Adaptive Query Plan, Mixed Workloads
ACM Reference Format:
Haoze Song, Wenchao Zhou, Feifei Li, Xiang Peng, and Heming Cui. 2023. Rethink Query Optimization
in HTAP Databases. Proc. ACM Manag. Data 1, 4 (SIGMOD), Article 256 (December 2023), 27 pages. https:
//doi.org/10.1145/3626750
1 INTRODUCTION
Today, data-intensive applications often utilize vast amounts of data for diverse real-time business
tasks (e.g., data-driven decisions [
4
,
13
,
17
,
26
]), necessitating weaving analytical and transactional
processing techniques together [
45
]. In response, many recent academic and industrial eorts have
been devoted to developing hybrid transactional and analytical processing (HTAP) systems [
2
,
16
,
31
,
33
,
35
,
41
43
,
49
,
51
,
55
57
,
61
,
62
,
68
], which are expected to provide
1
prompt analysis of
Work performed during an internship at Alibaba Group.
Authors’ addresses: Haoze Song, hzsong@cs.hku.hk, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR; Wenchao
Zhou, zwc231487@alibaba-inc.com, Alibaba Group, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China;
Feifei Li, lifeifei@alibaba-inc.com, Alibaba Group, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China;
Xiang Peng, pengxiang.px@alibaba-inc.com, Alibaba Group, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang,
China; Heming Cui, heming@cs.hku.hk, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR.
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0
License.
© 2023 Copyright held by the owner/author(s).
2836-6573/2023/12-ART256
https://doi.org/10.1145/3626750
Proc. ACM Manag. Data, Vol. 1, No. 4 (SIGMOD), Article 256. Publication date: December 2023.
Downloaded from the ACM Digital Library on April 8, 2025.
256:2 Haoze Song et al.
PK
Tuple ID
Results*
Primary Index
Key (s)
PK
Results*
Secondary Indices
Indices
Row-oriented Store
Column-oriented Store
Async. Data Synchronization
N-ary
Storage Model
Decomposition
Storage Model
Delta Merge
(a) Hybrid Physical Layout.
Data
Freshness
Perf.
Isolation
Range Scan
Efficiency
Probe
Efficiency
Better
Resource
Utilization
Optimization
Efficiency
METIS
Row-oriented Plan Column-oriented Plan
HTAP-agnostic Plan
(b) Performance Comparison.
Fig. 1. (a) shows an example of the hybrid physical layout in modern HTAP systems (e.g., SQL Server [
28
],
TiDB [
33
]): the row-oriented tables are well-suited for updates and probes; a second copy in a column-oriented
layout is optimized for range scan. Leveraging a hybrid physical layout, Metis strikes a practical balance
between performance, isolation, and freshness for HTAP (see (b)).
fresh data and 2 isolate the performance of interleaved workloads.
A practical HTAP database generally consists of an online transactional processing (OLTP)
engine that supports high throughput transaction processing, and an online analytical processing
(OLAP) engine supports analytics with low latency. To handle mixed workloads eciently, a popular
category of distributed HTAP databases (e.g., SQL Server [
42
], TiDB [
33
], ByteHTAP [
16
], PolarDB-
IMCI [
67
], Oracle Dual [
41
], and AlloyDB [
31
]) typically employs the two engines with specialized
data stores and asynchronously replicated data from one copy to the other, achieving both
1
and
2
.
An example is shown in Figure 1a: a row-oriented store (for short, row store) that stores data
in rows is optimized for operating on a single data tuple at a time and accessing many attributes,
favor for OLTP; a column-oriented store (for short, column store) that stores the same attributes of
dierent rows contiguously in columns is optimized for accessing a massive number of rows at a
time with a subset of tuple attributes, favor for OLAP. To provide swift OLAP capabilities on new
data, updates are asynchronously replicated from the row store into the column store while posing
minimal impact on OLTP. We further formulate the system model in §2.1.
Given such a design, OLTP and OLAP workloads can be independently processed on their
desirable stores, thus naively providing isolations between OLTP and OLAP in the storage layer.
Unfortunately, restricting each workload to its specialized store leaves much of the performance
potential unrealized. This is because, for read-only queries, both the row and column store can
signicantly outperform one another based on the characteristics of system implementations and
workloads [
1
,
28
,
39
] (see our experimental results in §2.2). Thus, there may be queries for which
neither the column store nor the row store is optimal.
To reach the full potential of the hybrid physical layouts, several HTAP systems [
28
,
33
] have
integrated the two stores as alternative data access methods in their query optimizers to generate
hybrid plans for queries. Specically, a hybrid plan allows a single query to retrieve data from both
the row and column stores simultaneously and calculate the query results based on a consistent
data view.
Motivation. Nevertheless, existing approaches [
28
,
33
] select access paths and do query optimiza-
tions simply based on queries’ selectivity [
39
], neglecting the data dynamicity of HTAP databases.
In §3, we show that blindly pursuing hybrid plans can easily make the generated plans sub-optimal
and damage the two important properties: data freshness ( 1 ) and performance isolation ( 2 ).
Proc. ACM Manag. Data, Vol. 1, No. 4 (SIGMOD), Article 256. Publication date: December 2023.
Downloaded from the ACM Digital Library on April 8, 2025.
of 27
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜