暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
IEEE 47th - 香港大学 Detecting Hidden Failures of DBMS, A Comprehensive Metamorphic Relation Output Patterns Approach.pdf
183
6页
5次
2023-08-31
免费下载
1
Detecting Hidden Failures of DBMS:
A Comprehensive
Metamorphic Relation Output Patterns Approach
Matthew Siu-Hin Tang
Department of Computer Science
The Univeristy of Hong Kong
Pokulam, Hong Kong
tshcat@connect.hku.hk
T. H. Tse*
Department of Computer Science
The University of Hong Kong
Pokulam, Hong Kong
thtse@cs.hku.hk
Zhi Quan Zhou
School of Computing and IT, University of Wollongong
Wollongong NSW 2522, Australia, and
Alibaba and Ant Group, Hangzhou, China
zhiquan@uow.edu.au
AbstractThe testing of large databases faces the test oracle
problem, namely, that it is difficult to verify execution results
against expected outcomes. Rigger and Su applied metamorphic
testing through query partitioning and ternary logic partitioning
techniques to alleviate the challenge. In Part (A) of our project, we
conduct an in-depth investigation and have identified a gap
between the two techniques. We propose a disjoint partitioning
approach to address it. In Part (B), we conduct a comprehensive
investigation into the metamorphic testing of DBMS by comparing
disjoint partitioning with metamorphic relation output patterns
(MROPs) by Segura et al. We propose an exhaustive collection of
MROPs for DBMS. To the best of our knowledge, this is the first
project to integrate in-depth and comprehensive approaches to
tackle the diverse challenges in DBMS testing. In Part (C), we
conduct an empirical case study of their applications to
OceanBase, the DBMS associated with the world’s fastest online
transaction processing system. Although OceanBase has been
extensively tested and widely used in the industry, we have
detected 12 hidden failures and 8 new crashes.
Index Termstest oracle, metamorphic testing, metamorphic
relation output pattern, DBMS, SQL, OceanBase
I. INTRODUCTION
Because of the popularity of online transaction processing
(OLTP) systems [1] in the financial sector, the correctness of the
supporting database management system (DBMS) is crucial.
However, owing to the scale of large databases, the testing of
DBMS is challenging [2]. It faces the test oracle problem, which
refers to the difficulty in verifying system execution results [3]
[4]. The metamorphic testing (MT) methodology [5][6][7] was
invented in 1998 to alleviate the problem. In 2020, Rigger and
Su [8] applied MT to address the issue in DBMS testing through
the query partitioning (QP) and ternary logic partitioning (TLP)
techniques. Empirical studies showed that they revealed 175
failures in five DBMS.
Our current project is divided into three parts. In Part (A),
we conduct an in-depth investigation on [8]. We find a gap
between QP and TLP, and introduce the concept of disjoint par-
titioning (DP) to tackle the issue. In Part (B), we conduct a com-
prehensive investigation of the adequacy of the metamorphic
relations constructed via DP, comparing them with the metamor-
phic relation output patterns (MROPs) proposed by Segura et al.
[9]. We propose a more comprehensive approach for classifying
and constructing metamorphic relations in DBMS testing. In
Part (C) of the project, we perform an empirical case study on
OceanBase [10], which has been developed by the Alibaba and
Ant Group and associated with the world’s fastest OLTP [11].
We apply both the in-depth and comprehensive approaches to
test OceanBase, and have detected 12 hidden failures and 8 new
crashes.
II. PART (A): IN-DEPTH INVESTIGATION INTO
METAMORPHIC TESTING OF DBMS
A. Motivation Example
Consider the staff table on the right.
Let us write a SELECT statement in
SQL:
SELECT * FROM staff WHERE salary < 5000;
Suppose the DBMS returns the result
on the right. It reveals a failure because
Bob also has a salary lower than 5000
and is missing from the list.
Imagine that the staff table contains 10 000 records instead
of only three. We execute the same SQL SELECT statement
against the DBMS to list all the staff with a salary lower than
5000. Suppose the DBMS returns a list of 3000 staff. This time,
we cannot easily tell whether the result of 3000 staff misses any
legitimate record or contains any superfluous record. Owing to
the large volume of data, it is difficult to detect failures.
B. Metamorphic Testing
In software testing, a test oracle is the mechanism to verify
the execution result against the expected outcome [12]. The test
oracle problem refers to the situation where such a mechanism
is either missing or extremely difficult to apply [3][4]. As we
have observed in the motivating example, it would be challeng-
ing to verify the result of a given query in large databases.
In 1998, T. Y. Chen invented the metamorphic testing (MT)
methodology [5][6] [7] , which supports test case generation and
alleviates the test oracle problem. He defines metamorphic
* Corresponding author.
id
name
salary
1
Alice
5000
2
Charlie
2000
3
Bob
3000
id
name
salary
2
Charlie
2000
HKU CS Tech Report TR-2023-01
2
relations (MRs) as necessary properties of the target function or
program in relation to multiple inputs and their expected out-
puts. To conduct MT, some program inputs are first constructed
as original test cases (called source test cases). On the basis of a
target MR, new inputs are constructed as follow-up test cases.
Contrary to traditional testing, which verifies the correctness of
each individual test result, MT verifies the relation among the
source and follow-up inputs and outputs with respect to the MR.
About 500 papers on MT have been published. However,
research work on the application of MT to DBMS has been very
limited [8] [13][14].
When applying MT to DBMS testing, a source test case may
be an initial query whereas a follow-up test case may be a subse-
quent query constructed according to a target MR. Then, the
relationship between their outputs (resultant lists returned by the
DBMS) can be verified with reference to the MR. In the case of
the staff table example, we may specify an MR such that “if we
add one more condition to the query, the DBMS should return
fewer records.” For instance, we may refine our query to ask for
all the staff with salary < 5000 AND name = 'Alice'. On execu-
tion, if the DBMS returns 4000 staff, which is more than the
previous result, we say that there is a violation of the MR. It
indicates a failure of the DBMS. In this way, failures are
revealed using MT without the need for a test oracle.
C. Revisit of QP and TLP by Rigger and Su
Rigger and Su [8] proposed query partitioning (QP) and
ternary logic partitioning (TLP) for metamorphic testing of
DBMS. QP is a general strategic concept that describes an MR
among DBMS queries. Outputs of the follow-up queries are
non-overlapping sublists of the output of the source query. The
MR specifies that the concatenation of the outputs of the follow-
up queries must be the same as the output of the source query.
A violation of the MR indicates a failure.
TLP is a special case of QP. An original query is transformed
into three partitioning queries, which are three predicate variants
derived from a randomly generated predicate. Let us consider
the staff table again, with a source query to return all the staff.
We generate a random predicate name = 'Alice'. We derive three
partitioning queries with three predicate variants: name = 'Alice',
NOT (name = 'Alice'), and name = 'Alice' IS NULL. Since SQL is
based on ternary Boolean logic [14], we know that name = 'Alice'
can be TRUE, FALSE, or NULL, falling into the resultant lists of
the three partitioning queries. Hence, following the MR stated in
QP, by concatenating the three resultant lists, we should obtain
the result of the original query, namely, all the staff in the table.
TLP has revealed 175 failures in five DBMS, including SQLite,
MySQL, CockroachDB, TiDB, and DuckDB.
D. Extension of QP and TLP to Disjoint Partitioning
1
Our thorough investigation reveals that the high-level QP is
too general for practical MR construction, while the low-level
TLP is too specific. We propose an innovative technique called
disjoint partitioning (DP) to fill the gap. The core idea is that
resultant lists returned by the DBMS can be divided into
exhaustive and mutually exclusive partitions using, for
instance, the SQL keywords LIMIT and OFFSET. For example,
given a staff table with 100 rows, we can write two queries
(Q0.1) SELECT * FROM staff LIMIT 50 OFFSET 0;
(Q0.2) SELECT * FROM staff LIMIT 50 OFFSET 50;
to return two resultant sublists, each containing the first and last
50 staff in the table, respectively.
Consider the entry_exit table below, recording the id, name,
and entry_exit_count of 10 000 staff in a large corporation:
id
name
entry_exit_count
1
Alice
4
...
...
...
10000
Dave
3
By examining the entry_exit_count for each staff, we can
determine that those with odd values are inside the building,
whereas those with even values are outside.
We can use the SQL bitwise and (&) operator with a constant
operand 1 to determine whether an entry_exit_count is odd or
even. An odd number & 1 returns an integer 1, and an even
number & 1 returns an integer 0.
Hence, we can write the following statement in OceanBase
DBMS to find all the staff (whose entry_exit_count is an odd
number) inside the building:
(Q1.1) SELECT * FROM entry_exit
WHERE entry_exit_count & 1;
OceanBase returns a list with 3000 staff. However, we do not
know whether the list of 3000 records is correct, because we do
not have a test oracle.
In DP , we propose follow-up queries such that they partition
list of rows in the original table into disjoint portions and apply
the same WHERE condition. We write the two follow-up queries
as follows:
(Q1.2) SELECT * FROM (SELECT * FROM entry_exit
LIMIT 5000 OFFSET 0) AS offset1
WHERE entry_exit_count & 1;
(Q1.3) SELECT * FROM (SELECT * FROM entry_exit
LIMIT 5000 OFFSET 5000) AS offset2
WHERE entry_exit_count & 1;
Query (Q1.2) retrieves the staff with an odd entry_exit_count
in rows 1 to 5000, whereas query (Q1.3) retrieves those in rows
5001 to 10 000. OceanBase returns 1500 and 1501 staff, respec-
tively.
The MR states that simple concatenation of the outputs of
follow-up queries (Q1.2) and (Q1.3) should produce the same
result as the original query (Q1.1). The MR is therefore violated
because there are more rows in the concatenated output from
(Q1.2) and (Q1.3). Thus, we have revealed a hidden failure.
III. PART (B): COMPREHENSIVE INVESTIGATION INTO
METAMORPHIC TESTING OF DBMS
Although our previous extension [16] of Rigger and Su’s
work to DP reveals failures, it only tests for one kind of MR. In
order to identify more MRs in DBMS, we draw our inspiration
from existing MR frameworks from the literature.
1
This part of the project was published in [16].
of 6
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜