暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
ICDE 2025_Query Weak Equivalence and Its Verification in Analytical Databases_OceanBase.pdf
74
13页
2次
2025-06-10
免费下载
Query Weak Equivalence and Its Verification in
Analytical Databases
1
st
Jinguo You*
Kunming University of Science
and Technology
Kunming, China
jgyou@kust.edu.cn
2
nd
Wanting Fu
Kunming University of Science
and Technology
Kunming, China
wtfu@stu.kust.edu.cn
3
rd
Yuxuan Wang
Kunming University of Science
and Technology
Kunming, China
yxwang@stu.kust.edu.cn
4
th
Peilei He
Kunming University of Science
and Technology
Kunming, China
peilei@stu.kust.edu.cn
5
th
Kaiqi Liu
Kunming University of Science
and Technology
Kunming, China
liukaiqi w@163.com
6
th
Quanqing Xu*
Ant Group OceanBase
Beijing, China
xuquanqing.xqq@oceanbase.com
Abstract—Modern database applications operate on massive
data and support a range of complex queries, especially OLAP
queries which are time-consuming. To accelerate query pro-
cessing, a variety of methods for automatically verifying query
equivalence have been proposed to avoid redundant executions
of equivalent queries, mainly in a semantic sense. However, we
have observed some queries that are not semantically equivalent
also return the same tuples under the specific data distribution,
which cannot be detected by most current automated verification
of query equivalence.
To deal with this issue, this paper proposes weak equivalence
for identifying queries that are not semantically equivalent but
produce the same results under the read-mostly scenarios such
as OLAP. Specifically, for posed queries, we extract their filter
condition expressions, which are then transformed into symbolic
representations, namely first-order logic formulae. In terms of
their partial order, i.e. containment relationship, we introduce
Query Lattice, a novel structure that is constructed as a lattice
which is partitioned into equivalence classes that are convex
to answer queries if we determine they belong to the classes.
The equivalence class enables stored queries to respond to
future unseen queries so that redundant generation of query
plan and execution can be bypassed. Experimental evaluation of
Query Lattice built on top of a prevailing open-source DBMS,
PostgreSQL shows that the maximum improvement that Query
Lattice can achieve is 44.95% over the original PostgreSQL,
when running on the datasets of both TPC-H and TPC-H Skew
benchmarks.
Index Terms—OLAP, Multiple-query optimization, Data cube,
Query lattice, Weak equivalence.
I. INTRODUCTION
With the proliferation of database applications, improving
the performance of database management systems has become
a major objective of database research and there is a long
line of work focusing on the optimization of queries [1]–
[5]. The growing number of data analytic systems, like Myria
[6], Dremel [7], Spark SQL [8], and so on, enable people
*Corresponding authors.
to deploy complex data processing which has exhibited sig-
nificant redundant operations including complex computations
and reading tuples from disk due to unnecessary execution of
queries that return the same tuples. These resources wasted are
caused by redundant execution of repeated queries, equivalent
queries with different forms, and not equivalent queries but
with the same outputs on a given data distribution. In practice,
it is difficult to manually clarify these redundant, especially
the not equivalent queries returning the same answer. Thus,
automated methods for minimizing unwanted execution are
required.
In this paper, we say two queries are similar if they select the
same columns and have the same aggregate functions with the
same inputs. The difference between two similar queries is that
they have different filter predicates. Except for semantically
equivalent queries, these similar but not equivalent queries
can still produce identical results depending on the data
distribution of the specific database instance. For the case
that queries have the same results only under some specific
data distributions, we say these queries have weak equivalence
and the queries that have weak equivalence are similar first.
In read-mostly environments, like data warehouses, most are
long-running analytical queries [9]. The methods mentioned
above are dedicated to verifying equivalent SQL queries. As
a result, the opportunity to optimize these similar queries that
have weak equivalence is missed.
Researchers have extensively studied the problem of de-
tecting equivalent queries [10], [11]. Previous research has
concentrated on identifying a subset of relational algebra
where it is feasible to determine query equivalence under
set and bag semantics [12], [13]. [14] focused solely on the
theoretical aspects of this problem. Recently, two pragmatic
approaches have been proposed for determining the semantic
equivalence of queries. The algebraic approaches translate
queries into algebraic expressions and then apply a set of
rewrite rules to standardize the expressions [15]–[17]. Finally,
1882
2025 IEEE 41st International Conference on Data Engineering (ICDE)
2375-026X/25/$31.00 ©2025 IEEE
DOI 10.1109/ICDE65448.2025.00144
BBAAD9C20180234D78A0072836F0B970A2B9B2091AE1FBA0AED98436B1142B0C7B48B93861513B0122392408984659EB40E9217A61D04B311BBFC236724E32D124124DAD8521D9A7941D2A276F2749B7CA37E83875DB437E48D6019C18B00CD8DAC625989E3
they perform a sequence of tests to check for isomorphisms
and homomorphisms between the rewritten expressions to
determine the equivalence of the original queries. COSETTE
utilizes K-relations [18] to represent queries while UDP em-
ploys U-semirings [16], [17], [19]. These algebraic approaches
are used to verify the correctness of query rewrite rules
in query optimizers. They can therefore validate complex
structural transformations of queries. However, these algebraic
approaches still have limitations. First, they are unable to
model the semantics of widely used SQL features, such as
complex query predicates, arithmetic operations, and three-
valued logic for supporting NULL [20]. This limits their ability
to support a wide range of real-world queries. Secondly, its
verification procedure is computationally intensive due to a lot
of rewrite rules that need to be applied to the given algebraic
expressions to determine their equivalence. These limitations
restrict their efficiency in DBaaS offerings.
To address these issues, the symbolic approach EQUITAS
[21] determines query equivalence using two steps. It initially
converts a query to a symbolic representation (SR) (i.e.,
a set of first-order logic formulae) and thereafter uses a
general-purpose Satisfiability Modulo Theory (SMT) solver
to ascertain the containment relationship between two SRs. If
the containment relationship holds in both directions, then the
two queries are considered equivalent. The symbolic method
overcomes the limitations of the algebraic method by reducing
the problem of proving query equivalence to a constraint
satisfaction problem [21]. Since the symbolic approach can
model more SQL features, more SQL equivalence solvers
are proposed to discover more equivalent queries [22], [23].
While this symbolic approach addresses the drawbacks of the
algebraic approach, both of them have one limitation. They
only optimize the repetitive execution of equivalent queries
(i.e., these queries must return the same tuples for any database
instances).
Therefore, we propose to take advantage of this optimization
opportunity by leveraging a novel approach for determining
weak equivalence for similar queries. We present a lattice
structure called Query Lattice to automatically detect similar
queries having weak equivalence and answer them. The sym-
bolic representations of queries’ filter conditions are derived
and stored in the Query Lattice, which is then partitioned
into equivalence classes based on whether queries have partial
order and their results are the same. This approach can
model various complex query predicates including arithmetic
operations and three-valued logic. We then utilize the com-
putationally efficient SMT solver to establish containment
among SRs of queries, since the problem of determining the
weak equivalence of queries is reduced to that of deciding
the satisfiability of formulae in first-order logic. Deciding the
satisfiability of FOL formulae is an NP-complete problem.
However, in practice, modern solvers employ heuristics from
satisfiability theory [24] to efficiently solve FOL formulae
[25], [26].
If new queries have weak equivalence with cached queries
in our lattice, we employ cached results from that equivalence
class to answer the queries. Our method eliminates the need
for any succeeding query plan generation and execution when
a query can be answered by an equivalence class. Furthermore,
the lattice structure minimizes the search space for a query to
find a matching class by leveraging the partial order among
equivalence classes in the lattice. Though both problems of
deciding query equivalence and containment are undecidable
[27], [28], these two problems can now be reduced to that of
deciding the satisfiability of formulae in first-order logic. This
allows for the use of computationally efficient SMT solvers
[24].
In summary, we make the following contributions:
We illustrate the limitations of verification of query
equivalence and motivate the need for a new method to
determine the equivalence of query results.
We introduce the concepts of input-containment and
weak equivalence to describe those queries that are
not semantically equivalent but have the same results.
The weak equivalence can be proven through input-
containment and equivalence class.
We present Query Lattice, a novel structure composed
of symbolic representations of filter conditions of
queries. We demonstrate that this structure improves the
performance of PostgreSQL by handling not semantically
equivalent queries which have the same outputs.
We implement our Query Lattice in PostgreSQL, a widely
used database, and Query Lattice is evaluated on both
TPC-H and TPC-H Skew benchmarks. We demonstrate
that this structure can find similar queries with weak
equivalence in both benchmarks which improves the
response time of query answering compared to the
original PostgreSQL.
The rest of this paper is organized as follows: Section 2
describes the motivation of Query Lattice. Section 3 presents
the process by which Query Lattice verifies query results
equivalence between queries. Section 4 describes the hierar-
chical structure of Query Lattice. Section 5 demonstrates two
main algorithms used for comparing and updating. Section 6
contains experiments to validate the efficacy of Query Lattice
in improving database performance and lays out the limitations
of the current implementation of Query Lattice. Finally, the
paper is concluded in Section 7.
II. BACKGROUND & MOTIVATION
In this section, we first use two examples to show similar
queries with weak equivalence that former validation methods
for equivalent queries are not applicable. We then present
an overview of how Query Lattice solves this problem and
illustrate this approach with some definitions.
A. Containment and Equivalence Problem
A pair of queries Q
1
and Q
2
is deemed to have a contain-
ment relationship provided the following constraint is satisfied:
1883
BBAAD9C20180234D78A0072836F0B970A2B9B2091AE1FBA0AED98436B1142B0C7B48B93861513B0122392408984659EB40E9217A61D04B311BBFC236724E32D124124DAD8521D9A7941D2A276F2749B7CA37E83875DB437E48D6019C18B00CD8DAC625989E3
of 13
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜