VLDB2024_UniView：A Unified Autonomous Materialized View Management System for Various Databases_华为.pdf

迹部景吾

745

4页

5次

2024-09-09

免费下载

UniView: A Unified Autonomous Materialized View Management

System for Various Databases

Zhenrong Xu, Pengfei Wang

Zhejiang University

Hangzhou, China

{xuzhenrong,wangpf}@zju.edu.cn

Guoze Xue, Qitong Yan

Zhejiang University

Hangzhou, China

{xuegz,qitong.yan}@zju.edu.cn

Shenghao Gong, Yelan Jiang

Zhejiang University

Hangzhou, China

{gongshenghao,jiangyelan}@zju.edu.cn

Yuren Mao, Yunjun Gao

Zhejiang University

Hangzhou, China

{yuren.mao,gaoyj}@zju.edu.cn

Shu Shen, Wei Zhang, Dan Luo

Huawei

Hangzhou, China

{shenshu,zhangwei09,luodan2}@huawei.com

Lu Chen

Zhejiang University

Hangzhou, China

luchen@zju.edu.cn

ABSTRACT

Materialized views (MVs) are critical for improving query perfor-

mance of database systems, especially in online analytical process-

ing (OLAP) databases. Typically, MVs are maintained by DBAs,

which relies on prior knowledge and manual operations. Recently,

autonomous solutions are designed for specic databases. How-

ever, a data warehouse for OLAP is typically hierarchical, which

uses dierent database engines at dierent stages. Hence, existing

methods have limitations in terms of autonomy and unication to

support practical applications.

Motivated by these, we develop UniView, a unied autonomous

materialized view management system that supports various popu-

lar databases, including Spark SQL, PostgreSQL, and ClickHouse.

Moreover, we provide a cross-platform web user interface, where

users can carry out the process of materialized views and evaluate

the optimization performance. In the demonstration, we show that

UniView is user-friendly and can achieve superior performance in

the practical industry scenarios.

PVLDB Reference Format:

Zhenrong Xu, Pengfei Wang, Guoze Xue, Qitong Yan, Shenghao Gong,

Yelan Jiang, Yuren Mao, Yunjun Gao, Shu Shen, Wei Zhang, Dan Luo,

and Lu Chen. UniView: A Unied Autonomous Materialized View

Management System for Various Databases. PVLDB, 17(12): 4353 - 4356,

2024.

doi:10.14778/3685800.3685873

PVLDB Artifact Availability:

The source code, data, and/or other artifacts have been made available at

https://github.com/ZJU-DAILY/UniView.

1 INTRODUCTION

Materialized views (MVs) are of critical importance to the query

performance of database systems. As shown in Figure 1, we can

materialize a view to speed up the query process. In online analyti-

cal processing (OLAP) databases, many SQL queries share common

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.

doi:10.14778/3685800.3685873

OrderKey CustKey TotalPrice OrderDate Name Address Zipcode

0001 0001 10.0 09152022 A E 310024

0002 0002 100.0 09152022 B F 310023

0003 0004 50.0 09152022 C G 310024

0004 0003 2000.0 09152022 D H 310020

Select Order.TotalPrice

From View

Where Zipcode=310024

Select Order.TotalPrice

From Orders, Custormers

Where Oders.CustKey = Custormers.CustKey

AND Zipcode=310024

View

Figure 1: An example of materialized views.

subqueries and there are lots of redundant computations among

these queries. Materializing views on these subqueries can avoid

redundant computation and improve query performance, which is

a space-for-time trade-o principle. Therefore, it is vital to select

the optimal MVs that can bring the most query performance im-

provement within a space budget. For example, in Amazon Redshift,

automated materialized views are a powerful tool for improving

query performance.

Most existing methods rely on DBAs to generate and maintain

MVs [

]. Nevertheless, these methods require prior knowledge and

manual operations, which are costly, and thus, cannot eciently

and eectively support large-scale databases. Motivated by this,

autonomous MV selection methods are proposed recently. The MV

selection process within a space budget can be regarded as a 0-

1 integer linear programming (0-1 ILP) problem. Solving 0-1 ILP

is too expensive for large workloads since the complexity of the

0-1 ILP approach is

𝑂 (

𝑛

)

. To eciently solve the MV selection,

two lines of methods exist. One line of MV selection methods is

heuristics, such as the greedy strategy [

]. Another line of studies

uses reinforcement learning (RL) [

] or graph algorithm [

]

to solve the 0-1 ILP problem faster.

There exist many works[

–

] that leverage ML meth-

ods to solve database problems, indicating a growing trend.

However, existing automatic MV selection methods are designed

for specic database systems, e.g., DQM [

] designed for Spark

SQL, AutoView [

] designed for PostgreSQL, DBMind [

] designed

for OpenGauss, and SOFOS [

] designed for knowledge graphs.

4353

Cost

Workload

I.MV Generation

Database Engine

II.Cost Estimation

III.MV Recommend

IV.MV Rewriting

DNN

Training

DNN

Testing

Materialize Rewriting

Greedy

Strategy

AST

Figure 2: UniView system architecture.

Figure 3: MV generation algorithm.

Their customized design makes them highly coupled and hard to

migrate to other databases. In real life applications, dierent types of

database systems are typically used at dierent stages. For example,

a data warehouse hierarchical uses dierent database engines (e.g.,

Spark SQL and Clickhouse) at dierent stages for decision-making.

Hence, a unied management system is preferred to support various

popular databases while materializing views autonomously.

In this demonstration, we develop UniView, a unied autonomous

materialized view management system for various databases. Specif-

ically, UniView consists of four phases: (i) MV Generation aims to

parse the queries and generate candidate views; (ii) Cost Estimation

utilizes the deep network to estimate the cost of queries and MVs;

(iii) MV Recommend aims to recommend the optimal MVs within

a space budget based on the cost; and (iv) MV Rewriting aims at

rewriting the query using the most appropriate views. UniView

supports three dierent types of database systems, i.e., Spark SQL,

PostgreSQL and ClickHouse. We also provide a cross-platform web

UI, where users can submit queries and get recommended views

to materialize. The web UI can demonstrate the dierence in ex-

ecution performance of queries with/without materialized views

so that users are able to understand better how UniView improves

the query performance. Moreover, UniView has been deployed in

the Huawei Consumer Business Group (CBG) to manage materi-

alized views for query performance improvement. We summarize

the contributions as follows:

•

We demonstrate UniView, a unied autonomous materi-

alized view management. To the best of our knowledge,

UniView is the rst autonomous materialized view manage-

ment supporting various popular databases simultaneously.

•

We implement a cross-platform web UI to interact with

users and demonstrate the improvements brought by Uni-

View. We have open-sourced UniView, and it is available

at GitHub https://github.com/ZJU-DAILY/UniView.

•

UniView has been deployed in Huawei CBG to improve

query eciency. Our preliminary results show that Uni-

View is able to reduce query execution time using recom-

mended materialized views, which veries the eectiveness

of UniView in the real-world industry scenario.

The rest of the paper is organized as below. Section 2 presents

system overview. Section 3 provides demonstration overview. Sec-

tion 4 makes conclusions with promising future directions.

2 SYSTEM OVERVIEW

We rst oer some preliminaries of UniView, and then, we present

the system architecture and a detailed workow of UniView.

2.1 Preliminaries

We rst introduce the query tree to represent a SQL query, based

on which, materialized view management is present.

Query Tree. Given a SQL query

𝑞

, we parse it as a query tree (ab-

stract syntax tree, AST). Each subtree rooted at a node corresponds

to a subquery, and each node indicates an operator. All subqueries

except the leaves in the query tree can be materialized as views.

Materialized View Management. Given a query workload

𝑄

there exists a set

of candidate MVs. It is vital to select a subset

𝑉

∗

⊆ V

to materialize within a given space budget

𝜏

, which can

minimize the total execution time of the query workload.

2.2 Workow

Figure 2 illustrates the overall system architecture of UniView. It

is composed of four phases, namely, (i) MV generation, (ii) cost

estimation, (iii) MV recommend, and (iv) MV rewriting.

MV Generation aims to nd common subqueries for generating

candidate MVs

. First of all, we parse all SQL queries in the query

workload

𝑄

as query trees. Common subqueries are the equivalent

subtrees among dierent query trees of queries. After nding all

common subqueries, we are able to generate

. Specically, we

compute the qualities of all common subqueries. The qualities are

formulated as the weighted sum of some important factors, e.g., the

number of MV that matched the original queries, the size of the table

that the MV contains, and the number of predicates. After that, the

common subqueries with high qualities will be selected as candidate

MVs

. Our UniView supports three popular databases, including

Spark SQL, PostgreSQL, and ClickHouse. Figure 3 illustrates the

abstract code of MV generation for Spark SQL, PostgreSQL, and

ClickHouse. Note that, since execution plan structures and operator

types generated by dierent database engines are dierent, it is

required to customize the analysis of the execution plans for the

three databases.

Cost Estimation aims at conducting cost estimation to estimate

the benet, including the execution time and the space cost. The

benet estimation is the dierence between the cost of a query

and the corresponding rewritten query. We also estimate the space

cost (i.e., storage cost) of each MV candidate. We adopt a deep

4354