暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL Incremental View Maintenance - ivm (增量物化视图MATERIALIZED VIEW)进化

digoal 2019-09-22
1954

作者

digoal

日期

2019-09-22

标签

PostgreSQL , 物化视图 , MATERIALIZED VIEW


背景

物化视图通常在olap系统中被用于预计算,提高需要大量计算量或IO量的分析型SQL的实时(准实时)结果。

例如实时查询聚合结果。

PostgreSQL 9.3 初次引入物化视图的功能,当时只支持全量刷新,并且刷新物化视图时会堵塞DML。

PostgreSQL 9.4 支持了增量刷新,刷新物化视图时不堵塞dml,增量刷新的方法是执行物化视图定义语句,并与物化视图当前内容进行JOIN比对,增量刷新当前物化视图。

一些常见的商业数据库如何刷新物化视图的呢?

两种比较常见的方法:

1、解析redo

Other RDBMS implement this by reading the logs (WAL logs) and implementing the refresh in real time without the penalty of trigger.

2、存储mvlog,增量合并mvlog到mv

Oracle uses something called Materialized View Logs that is a log created on base tables and used for fast refreshes on commit or on demand.

I think that a MV with many aggregations spanning many rows in base tables can cause a serious contention problem on transactions. Also, serializable properties should be maintained.

This feature will be welcome and thanks for your hard work.

PostgreSQL ivm

http://pgsqlpgpool.blogspot.com/2019/08/automatically-updating-materialized.html

PostgreSQL 社区下一步将支持增量实时/准实时刷新物化视图。

实现方式也可能会采用分步走的方法:

1、先支持触发器级别的实施物化视图更新,这个实现简单。适合写入少的物化视图。估计在PG 13就会支持。

2、第二步可能会使用redo解析或使用mvlog的方式,支持异步的增量物化视图刷新(与商用数据库的物化视图功能刷新方法基本一致)。

Yes, using some kind of logs is one of our future plans. The reason why we use trigger for now is, it's simpler and easier to implement. Also by using triggers, changes will be made to matviews by the time when current transaction gets committed, which cannot be achieve by using WAL logs, because WAL logs never be written until transaction gets committed. So our plan using logs will be for "deferred" update of matviews.

下面是第一步:

Materialized views are convenient and efficient way to retrieve information from database. Unlike ordinary views, materialized views save the query result and provide faster access to the data. For example, user can create a simple materialized view containing the number of rows in a table:

CREATE MATERIALIZED VIEW mv1 AS SELECT count(*) FROM pgbench_accounts;

Obtaining the number of rows using the materialized view mv1 is much faster than directly accessing number of rows in pgbench_accounts.

```
test=# SELECT * from v1;
count


100000
(1 row)

Time: 0.810 ms
test=# SELECT count(*) FROM pgbench_accounts;
count


100000
(1 row)

Time: 13.945 ms
```

However if you delete a row from pgbench_accounts, the number of rows in mv1 is still unchanged. To reflect the change of the base table (in this case pgbench_accounts) , you need to recreate or refresh (this actually recreate the contents of materialize views from scratch), which may take long time.

To overcome the problem, SRA OSS is proposing to add a new feature to existing materialized view "incremental materialized view maintenance". This works like this.

add new syntax to allow incrementally update the materialized view when it is created.
install triggers automatically to be attached to the base table(s).
when UPDATE/DELETE/INSERT applies on the base tables, the trigger calculate the difference to the rows in the materialized view and update the rows.
These should update the materialized view in a real time manner. The price is the over head of trigger invocation. So frequently updating tables are not best suite for the incremental materialized view maintenance. Here are demonstrations how to use the incrementally updating materialized view.

First create incrementally updating materialized view.

INCREMENTAL MATERIALIZED语法的引入:

CREATE INCREMENTAL MATERIALIZED view mv2 AS SELECT count(*) FROM pgbench_accounts;

The view initially says there are 100,000 rows.

```
SELECT * FROM mv2;
count


100000
(1 row)
```

Delete a row from pgbench_accounts. Now the number of rows is 99999.

```
DELETE FROM pgbench_accounts WHERE aid = 10;
DELETE 1
SELECT count(*) FROM pgbench_accounts;
count


99999
(1 row)
```

Make sure that mv2 reports the same number of rows.

```
SELECT * FROM mv2;
count


99999
(1 row)
```

It works!

This is a very simple materialized view. More complex queries, for example inner joins also work.

复杂query的物化视图依旧适用。

```
CREATE INCREMENTAL MATERIALIZED VIEW mv3 AS SELECT a.aid, b.bid, t.tid FROM pgbench_accounts a INNER JOIN pgbench_branches b ON (a.bid = b.bid ) INNER JOIN pgbench_tellers t ON (b.bid = t.bid) WHERE a.aid BETWEEN 1 AND 5;
SELECT 50

SELECT * FROM mv3 LIMIT 5;
aid | bid | tid
-----+-----+-----
3 | 1 | 10
2 | 1 | 1
2 | 1 | 5
2 | 1 | 4
5 | 1 | 3
(5 rows)

DELETE FROM pgbench_accounts WHERE aid = 2;
DELETE 1

SELECT * FROM mv3 LIMIT 5;
aid | bid | tid
-----+-----+-----
3 | 1 | 10
5 | 1 | 3
1 | 1 | 1
5 | 1 | 8
4 | 1 | 3
(5 rows)
```

Implementing other types of queries such as outer joins, self joins are in our plan, possibly toward PostgreSQL 13.

PostgreSQL物化视图进化小结

PostgreSQL 9.3 初次引入物化视图的功能,当时只支持全量刷新,并且刷新物化视图时会堵塞DML。

PostgreSQL 9.4 支持了增量刷新,刷新物化视图时不堵塞dml,增量刷新的方法是执行物化视图定义语句,并与物化视图当前内容进行JOIN比对,增量刷新当前物化视图。

PostgreSQL 社区下一步(预计PG 13)将支持增量实时/准实时刷新物化视图。

实现方式也可能会采用分步走的方法:

1、先支持触发器级别的实施物化视图更新,这个实现简单。适合写入少的物化视图。估计在PG 13就会支持。

2、第二步可能会使用redo解析或使用mvlog的方式,支持异步的增量物化视图刷新(与商用数据库的物化视图功能刷新方法基本一致)。

参考

http://pgsqlpgpool.blogspot.com/2019/08/automatically-updating-materialized.html

https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp

https://github.com/sraoss/pgsql-ivm

https://www.postgresql.org/about/featurematrix/#views-materialized-views

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论