暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

重新发现PostgreSQL之美 - 50 一粒老鼠屎

原创 digoal 2022-01-20
400

作者

digoal

日期

2021-08-27

标签

PostgreSQL , ring buffer , BAS_BULKREAD


视频回放: https://www.bilibili.com/video/BV1aq4y1U7Rm/

场景:
- 在正常业务使用期间, DBA、开发者、分析师在数据库中跑大查询, 某些大表采用了全表扫描.

挑战:
- 大表的全表扫描会占用buffer pool, 从而将shared buffer中的热数据挤出去, 导致其他业务的SQL变慢, 严重的导致雪崩.

PG 解决方案:
- 《PostgreSQL 大表扫描策略 - BAS_BULKREAD , synchronize_seqscans , ring buffer 代替 buffer pool》
- 超过1/4 shared buffer的table , 全表扫描会使用ring buffer (256KB)代替buffer pool
- page 标记为BAS_BULKREAD, 优先淘汰出buffer.

除了全表扫描, PG的bulk - write, vacuum都有类似机制:
bulk - write 16MB ring buffer
COPY FROM command.
CREATE TABLE AS command.
CREATE MATERIALIZED VIEW or REFRESH MATERIALIZED VIEW command.
ALTER TABLE command.

vacuum 256KB ring buffer.

When reading or writing a huge table, PostgreSQL uses a ring buffer rather than the buffer pool. The ring buffer is a small and temporary buffer area. When any condition listed below is met, a ring buffer is allocated to shared memory:

Bulk-reading
When a relation whose size exceeds one-quarter of the buffer pool size (shared_buffers/4) is scanned. In this case, the ring buffer size is 256 KB.

Bulk-writing
When the SQL commands listed below are executed. In this case, the ring buffer size is 16 MB.

COPY FROM command.
CREATE TABLE AS command.
CREATE MATERIALIZED VIEW or REFRESH MATERIALIZED VIEW command.
ALTER TABLE command.

Vacuum-processing
When an autovacuum performs a vacuum processing. In this case, the ring buffer size is 256 KB.
The allocated ring buffer is released immediately after use.

The benefit of the ring buffer is obvious. If a backend process reads a huge table without using a ring buffer, all stored pages in the buffer pool are removed (kicked out); therefore, the cache hit ratio decreases. The ring buffer avoids this issue.

Why the default ring buffer size for bulk-reading and vacuum processing is 256 KB?
Why 256 KB? The answer is explained in the README located under the buffer manager's source directory.

For sequential scans, a 256 KB ring is used. That's small enough to fit in L2 cache, which makes transferring pages from OS cache to shared buffer cache efficient. Even less would often be enough, but the ring must be big enough to accommodate all pages in the scan that are pinned concurrently. (snip)

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论