
they perform a sequence of tests to check for isomorphisms
and homomorphisms between the rewritten expressions to
determine the equivalence of the original queries. COSETTE
utilizes K-relations [18] to represent queries while UDP em-
ploys U-semirings [16], [17], [19]. These algebraic approaches
are used to verify the correctness of query rewrite rules
in query optimizers. They can therefore validate complex
structural transformations of queries. However, these algebraic
approaches still have limitations. First, they are unable to
model the semantics of widely used SQL features, such as
complex query predicates, arithmetic operations, and three-
valued logic for supporting NULL [20]. This limits their ability
to support a wide range of real-world queries. Secondly, its
verification procedure is computationally intensive due to a lot
of rewrite rules that need to be applied to the given algebraic
expressions to determine their equivalence. These limitations
restrict their efficiency in DBaaS offerings.
To address these issues, the symbolic approach EQUITAS
[21] determines query equivalence using two steps. It initially
converts a query to a symbolic representation (SR) (i.e.,
a set of first-order logic formulae) and thereafter uses a
general-purpose Satisfiability Modulo Theory (SMT) solver
to ascertain the containment relationship between two SRs. If
the containment relationship holds in both directions, then the
two queries are considered equivalent. The symbolic method
overcomes the limitations of the algebraic method by reducing
the problem of proving query equivalence to a constraint
satisfaction problem [21]. Since the symbolic approach can
model more SQL features, more SQL equivalence solvers
are proposed to discover more equivalent queries [22], [23].
While this symbolic approach addresses the drawbacks of the
algebraic approach, both of them have one limitation. They
only optimize the repetitive execution of equivalent queries
(i.e., these queries must return the same tuples for any database
instances).
Therefore, we propose to take advantage of this optimization
opportunity by leveraging a novel approach for determining
weak equivalence for similar queries. We present a lattice
structure called Query Lattice to automatically detect similar
queries having weak equivalence and answer them. The sym-
bolic representations of queries’ filter conditions are derived
and stored in the Query Lattice, which is then partitioned
into equivalence classes based on whether queries have partial
order and their results are the same. This approach can
model various complex query predicates including arithmetic
operations and three-valued logic. We then utilize the com-
putationally efficient SMT solver to establish containment
among SRs of queries, since the problem of determining the
weak equivalence of queries is reduced to that of deciding
the satisfiability of formulae in first-order logic. Deciding the
satisfiability of FOL formulae is an NP-complete problem.
However, in practice, modern solvers employ heuristics from
satisfiability theory [24] to efficiently solve FOL formulae
[25], [26].
If new queries have weak equivalence with cached queries
in our lattice, we employ cached results from that equivalence
class to answer the queries. Our method eliminates the need
for any succeeding query plan generation and execution when
a query can be answered by an equivalence class. Furthermore,
the lattice structure minimizes the search space for a query to
find a matching class by leveraging the partial order among
equivalence classes in the lattice. Though both problems of
deciding query equivalence and containment are undecidable
[27], [28], these two problems can now be reduced to that of
deciding the satisfiability of formulae in first-order logic. This
allows for the use of computationally efficient SMT solvers
[24].
In summary, we make the following contributions:
• We illustrate the limitations of verification of query
equivalence and motivate the need for a new method to
determine the equivalence of query results.
• We introduce the concepts of input-containment and
weak equivalence to describe those queries that are
not semantically equivalent but have the same results.
The weak equivalence can be proven through input-
containment and equivalence class.
• We present Query Lattice, a novel structure composed
of symbolic representations of filter conditions of
queries. We demonstrate that this structure improves the
performance of PostgreSQL by handling not semantically
equivalent queries which have the same outputs.
• We implement our Query Lattice in PostgreSQL, a widely
used database, and Query Lattice is evaluated on both
TPC-H and TPC-H Skew benchmarks. We demonstrate
that this structure can find similar queries with weak
equivalence in both benchmarks which improves the
response time of query answering compared to the
original PostgreSQL.
The rest of this paper is organized as follows: Section 2
describes the motivation of Query Lattice. Section 3 presents
the process by which Query Lattice verifies query results
equivalence between queries. Section 4 describes the hierar-
chical structure of Query Lattice. Section 5 demonstrates two
main algorithms used for comparing and updating. Section 6
contains experiments to validate the efficacy of Query Lattice
in improving database performance and lays out the limitations
of the current implementation of Query Lattice. Finally, the
paper is concluded in Section 7.
II. BACKGROUND & MOTIVATION
In this section, we first use two examples to show similar
queries with weak equivalence that former validation methods
for equivalent queries are not applicable. We then present
an overview of how Query Lattice solves this problem and
illustrate this approach with some definitions.
A. Containment and Equivalence Problem
A pair of queries Q
1
and Q
2
is deemed to have a contain-
ment relationship provided the following constraint is satisfied:
1883
BBAAD9C20180234D78A0072836F0B970A2B9B2091AE1FBA0AED98436B1142B0C7B48B93861513B0122392408984659EB40E9217A61D04B311BBFC236724E32D124124DAD8521D9A7941D2A276F2749B7CA37E83875DB437E48D6019C18B00CD8DAC625989E3
文档被以下合辑收录
评论