ICDE 2025_MemQ：A Graph-Based Query Memory Prediction Framework for Effective Workload Scheduling.pdf

迹部景吾

110

14页

2次

2025-06-17

免费下载

MemQ: A Graph-Based Query Memory Prediction

Framework for Effective Workload Scheduling

Yang Wu

, Xuanhe Zhou

, Xiaoguang Li

, Jinhuai Kang

, Chunxiao Xing

Tongliang Li

, Xinjun Yang

, Wenchao Zhou

, Feifei Li

, Yong Zhang

4,∗

Department of Computer Science, Tsinghua University, Beijing, China

Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China

Business-intelligence of Oriental Nations Corporation Ltd., Beijing, China

Beijing National Research Center for Information Science and Technology, Beijing, China

Alibaba Group, Hangzhou, China

Emails: {wu-y22}@mails.tsinghua.edu.cn, {zhouxh}@cs.sjtu.edu.cn, {lixiaoguang1, kangjinhuai}@bonc.com.cn,

{litongliang.ltl, xinjun.y, zwc231487, lifeifei}@alibaba-inc.com, {xingcx, zhangyong05}@tsinghua.edu.cn,

Abstract—Query memory prediction is an essential yet un-

derexplored problem in self-driving databases, particularly for

high-concurrency workload scheduling where efﬁcient resource

utilization is critical. Existing works mainly focus on cost and

latency estimation (e.g., using plan representation learning), while

memory prediction poses new challenges such as requiring (1)

numerous memory-speciﬁc training data, (2) memory-relevant

query plan featurization strategies, and (3) a prediction model

suitable for capturing the complexities of memory usage in query

operations. Moreover, most learning-based approaches do not

consider transferability across different datasets and database

systems.

This paper introduces MemQ, a graph-based memory pre-

diction framework designed for effective workload scheduling.

First, we build a comprehensive training dataset for memory

prediction by executing diverse query workloads across multiple

datasets and recording their diverse peak memory consumptions.

Second, our MemQ model leverages operator-level features of

query plans, achieving high prediction accuracy, compact model

size, and fast training and inference times. Third, we integrate the

MemQ model into memory-aware First Fit Decreasing (FFD) and

Bidrectional Fit (BF) scheduling strategy to optimize resource

utilization. Extensive experiments demonstrate the effectiveness

of our homogeneous query plan graph model. Moreover, our

FFD scheduling strategy reduces makespan (total query execution

time) by up to 55% and decreases retry counts by over 99%

compared to default strategies when batch executing analytical

queries on PostgreSQL. Furthermore, our novel BF strategy

reduces makespan by 15.17% and reduces sum of total time

by 41.41% compared with FFD strategy when batch executing

mixed workloads.

Index Terms—query memory prediction, homogeneous opera-

tor graph, workload scheduling, ﬁrst ﬁt decreasing

I. INTRODUCTION

In modern database systems, efﬁcient query processing

is essential to meet the demands of growing data volumes

and increasing query complexity. These systems must handle

This Work was supported by National Key R&D Program of China

(2023YFB4503600), NSF of China (61925205, 62232009, 62102215),

Business-intelligence of Oriental Nations Corporation Ltd., Alibaba Research

Intern Program.

diverse query types while optimizing resource utilization under

high concurrency requests. One key issue arises when queries

consume excessive memory, leading to allocation failures [29],

prolonged execution times, degraded system performance, and

even potential database shutdowns [1]. On the other hand,

insufﬁcient memory allocation results in suboptimal resource

utilization [2]. Addressing these issues requires memory pre-

diction strategy that can balance resource allocation and sys-

tem performance [15], [27].

Recent advancements in machine learning for databases

(ML4DB) [16] have introduced query plan representation

learning [45] as a cornerstone for tasks like cost estimation

and query optimization [8], [19]. While signiﬁcant progress

has been made in latency and cost prediction [12], [21],

[35], [39], [44], the domain of memory prediction remains

relatively underexplored. The key challenges of query memory

prediction are summarized as follows:

C1: Lack of Ground-Truth Data. Learned query memory

prediction methods demand a vast number of precise mem-

ory usage data for model training. Proﬁling memory usage

across diverse workloads and database systems is both time-

consuming and resource-intensive.

C2: Complexity of Query Plans. Modern query plans involve

various operator patterns and complex predicates. Memory

usage is inﬂuenced by factors such as intermediate data

sizes, join algorithms, and aggregation strategies, leading to

substantial variability.

C3: Transferability across Datasets. Memory usage can

vary signiﬁcantly across different datasets due to variations in

data distributions, such as skewness or irregularities. Queries

involving skewed joins or large group-by aggregations can

trigger unexpected memory spikes.

C4: Transferability across Systems. In different database

systems, the memory management and query execution strate-

gies can be signiﬁcantly different. Models trained for one

system often fail to generalize to others without substantial

adaptation. It is tricky to develop transferable prediction

Fig. 1. Data Collection and Query Memory Prediction in MemQ Framework

models that work across heterogeneous systems with minimal

modiﬁcation [26], [38].

C5: Real-Time Efﬁciency. Practical usage of query memory

prediction model such as workload scheduling requires balanc-

ing prediction accuracy with computational efﬁciency. Models

must be lightweight, fast to train, and efﬁcient during inference

to avoid bottlenecks in high-concurrency environments.

This paper addresses these challenges by introducing

MemQ, a graph-based memory prediction framework. The

framework encompasses dataset collection speciﬁcally for

memory prediction, query memory prediction over homoge-

neous operator graph, and workload scheduling using First

Fit Decreasing (FFD) or Bidirectional Fit (BF) strategy. The

framework is designed to handle complex query plans and

can be transferable across different datasets and even database

systems, and meanwhile maintain a relatively small size while

ensuring efﬁciency.

A. Contributions

The main contributions of this work are as follows:

• We are the ﬁrst to formalize the task of memory prediction

at the query level and introduce a graph-based query memory

prediction framework MemQ

• We design a tool to execute hundreds of thousands of queries

and collect comprehensive memory consumption data (C1).

We make this dataset publicly available

to facilitate future

studies.

• Our query memory prediction model leverages the structural

features of query plans (C2) and is designed to be transferable

across datasets (C3) and database systems (C4) while main-

taining a lightweight architecture for fast training and efﬁcient

inference (C5).

• We propose memory-aware FFD and BF scheduling al-

gorithms that leverage query-level memory predictions to

optimize query execution order and concurrency (C5).

• Extensive experiments demonstrate MemQ model’s superi-

ority on query memory prediction in accuracy, transferability,

and efﬁciency. Our memory-aware FFD scheduling strategy

achieves up to 55% reduction in execution time and a

99% reduction in query retries for batch execution of 900

https://github.com/earthwuyang/pg mem pred

https://cloud.tsinghua.edu.cn/d/9ad34a4caafe405ebcc7/

queries on PostgreSQL. Furthermore, when executing a mixed

workload of 100 queries, BF strategy reduces makespan by

15.17% and reduces sum of total time by 41.41% compared

with FFD strategy.

B. Paper Organization

The remainder of this paper is structured as follows: Sec-

tion II formalizes the Query Memory Prediction problem

and introduces MemQ. Section III presents its application on

workload scheduling. Section IV provides a comprehensive

evaluation of MemQ. Section V discusses the key ﬁndings

and points out potential future directions.

II. MEMQ: A GRAPH-BASED QUERY MEMORY

PREDICTION FRAMEWORK

In this section, we formalize the Query Memory Prediction

(QMP) problem and introduce the main techniques in MemQ,

including dataset construction, query featurization (e.g., the

homogeneous graph), and prediction model construction.

A. Problem Deﬁnition

The Query Memory Prediction problem involves estimating

the peak memory consumption M

of a query q during its

execution, which can be written as:

θ = arg min

q∈Q

[L(M

, f(q; θ))] (1)

where L is the loss function, Q denotes the set of all possible

queries, f (q; θ) is a predictive model parameterized by θ.

The objective of query memory prediction is to learn θ that

minimizes the prediction error between M

and f(q; θ) across

diverse workloads and environments.

B. Framework of MemQ

In MemQ, dataset collection, query memory prediction are

illustrated in Figure 1 and workload scheduling is presented

in Figure 3. These components are detailed below.

C. Dataset Collection

First, we construct a dataset involving diverse query mem-

ory prediction scenarios by combining publicly available real-

world datasets [24], i.e., airline, carcinogenesis, credit, em-

ployee, ﬁnancial, geneea, hepatitis, and walmart, with the

standard benchmarks like TPC-H and TPC-DS [25], [36].

of 14

免费下载

文档被以下合辑收录

数据库顶会 ICDE 2025 论文下载（共16篇）

本合辑收集了数据库顶会 ICDE 2025 的论文，可以免费下载。

关注

文档被以下合辑收录

评论