PostgreSQL 10 CLogControlLock 等待事件分析与优化 - hint bit, freeze, autovacuum, 风暴

digoal 2019-03-19

722

作者

digoal

日期

2019-03-19

背景

PostgreSQL 的tuple行头部有掩码来标识这条记录的事务结束状态（未知、已提交、已回滚），在事务提交时，并不需要修改这个掩码(hintbit)的内容。

《为什么PostgreSQL查询语句可能产生 xlog，并且可能对buffer有write操作? - hint bits》

因为如果要修改的话事务结束的动作就太重了。（例如一个事务写入100万条记录，事务结束时如果还需要去改这100万条记录的TUPLE头部掩码，显然是很慢并且不现实的因为你不能记录下当前会话所有动过的TUPLE吧，既然不能又怎么在事务结束时去修改这些记录的掩码呢）。

事务结束时，只需要修改这个事务对应的clog的状态位（每个事务的状态由2个比特位来表示，一个8K的clog，可以存储32768个事务状态。）。

当一条记录被TOUCH到（不管是读操作还是写操作还是autovacuum操作，只要访问到它）时，如果这条记录的头部的事务状态掩码还没有信息，需要从CLOG中获取这条记录对应的事务号的CLOG状态，然后再去修改这条记录在头部掩码中标记的的事务结束状态。一旦记录头部设置了事务状态掩码，就不再需要读CLOG来识别该记录的事务结束状态了。

另一方面，为了clog的访问更加的快速有效，CLOG也有对应的BUFFER。默认BUFFER最大为1MB。

/* * Number of shared CLOG buffers. * * On larger multi-processor systems, it is possible to have many CLOG page * requests in flight at one time which could lead to disk access for CLOG * page if the required page is not found in memory. Testing revealed that we * can get the best performance by having 128 CLOG buffers, more than that it * doesn't improve performance. * * Unconditionally keeping the number of CLOG buffers to 128 did not seem like * a good idea, because it would increase the minimum amount of shared memory * required to start, which could be a problem for people running very small * configurations. The following formula seems to represent a reasonable * compromise: people with very low values for shared_buffers will get fewer * CLOG buffers as well, and everyone else will get 128. */ Size CLOGShmemBuffers(void) { return Min(128, Max(4, NBuffers / 512)); }

操作CLOG，需要加排它锁。

例如

src/backend/access/transam/clog.c

/* * Record the final state of transaction entries in the commit log for all * entries on a single page. Atomic only on this page. */ static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids, TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool all_xact_same_page) { ... /* * If we can immediately acquire CLogControlLock, we update the status * of our own XID and release the lock. If not, try use group XID * update. If that doesn't work out, fall back to waiting for the * lock to perform an update for this transaction only. */ if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE)) { /* Got the lock without waiting! Do the update. */ TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status, lsn, pageno); LWLockRelease(CLogControlLock); return; }

当前clog相关的所有锁

```
CLogControlLock Waiting to read or update transaction status.

CLogTruncationLock Waiting to truncate the write-ahead log or waiting for write-ahead log truncation to finish.

clog Waiting for I/O on a clog (transaction status) buffer.

ClogGroupUpdate Waiting for group leader to update transaction status at transaction end.
```

什么情况下CLogControlLock锁等待会成为瓶颈，并严重影响性能呢？

例如，

1、高并发的小事务提交。

2、遇上clog buffer不足，需要大量的clog物理文件。

因为每一个事务提交都需要调用TransactionIdSetPageStatus，遇到buffer不够的情况，小事务吞吐会严重下降。

如何模拟CLogControlLock锁等待瓶颈

1、设置较大freeze max，table freeze，例如 18亿。

```
postgres=# show autovacuum_freeze_max_age ;
autovacuum_freeze_max_age

1800000000
(1 row)

postgres=# show vacuum_freeze_table_age ;
vacuum_freeze_table_age

1750000000
(1 row)
```

表示在表的年龄达到18亿前，不会进行FREEZE。

2、table1，（关闭这个表的autovacuum，保证这个表里面的记录没有被设置hintbit ）。对于纯写入的表，不会触发autovacuum，所以不关闭autovacuum，实际上只要不查询这个表，这些记录都是没有被设置hintbit的。

table1模拟单纯写如，15亿个事务，每个事务写入10条记录，总共写入150亿。

15亿个事务，clog大概占用375MB。而默认的clog buffer为1MB。

3、table2，开启高并发写事务。记录写吞吐能力。作为基准1。

4、table1，开启高并发读请求（对15亿以前的数据进行读取），读这些没有设置hint bit的记录时，需要从CLOG获得他们的状态，所以clog buffer很快会被用掉。

5、在模拟步骤4的同时，table2，开启高并发写事务。记录写吞吐能力，作为基准2。检查等待事件。 CLogControlLock

基准2会比基准1的吞吐小很多，原因是CLogControlLock等待。

原因：

clog buffer小，弊端：读clog文件，buffer进出，整体事务处理吞吐下降。

clog buffer大，弊端:？串行搜索链表，找空闲buffer块，整体效率下降？需check代码。

事件分析

1、autovacuum_freeze_max_age，vacuum_freeze_table_age很大。

2、有大量的日志型表（只有写入操作）

3、日志型的表几乎不被查询。或者查询很少。所以大量的记录hint bit没有被设置。

4、当这些日志表有大量没有设置的hintbit记录，当触发freeze时(扫描全表)，需要对这些记录进行扫描，并读取clog，设置hintbit。由于大多数记录没有hintbit，需要读取clog了解事物提交状态，会导致clog buffer很快被耗尽，如果同时有高并发小事务提交，需要写clog的PAGE状态，会出现锁等待。导致事务提交的吞吐下降。

如何优化

1、尽量有节奏的让后台去设置记录的hint bit，这样不管什么时候，都不会出现clog突然被占满，与高并发小事务提交的clog lock发生冲突。

2、加大clog buffer

3、内核优化（PG 11已优化）

PostgreSQL 11 内核代码层优化

PostgreSQL 11 进行了clog 锁机制的优化Use group updates when setting transaction status in clog.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=baaf272ac908ea27c09076e34f62c45fa7d1e448

```
Use group updates when setting transaction status in clog.

Commit 0e141c0fbb211bdd23783afa731e3eef95c9ad7a introduced a mechanism
to reduce contention on ProcArrayLock by having a single process clear
XIDs in the procArray on behalf of multiple processes, reducing the
need to hand the lock around. A previous attempt to introduce a similar
mechanism for CLogControlLock in ccce90b398673d55b0387b3de66639b1b30d451b
crashed and burned, but the design problem which resulted in those
failures is believed to have been corrected in this version.

Amit Kapila, with some cosmetic changes by me. See the previous commit
message for additional credits.

Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com
```

/* * When we cannot immediately acquire CLogControlLock in exclusive mode at * commit time, add ourselves to a list of processes that need their XIDs * status update. The first process to add itself to the list will acquire * CLogControlLock in exclusive mode and set transaction status as required * on behalf of all group members. This avoids a great deal of contention * around CLogControlLock when many processes are trying to commit at once, * since the lock need not be repeatedly handed off from one committing * process to the next. * * Returns true when transaction status has been updated in clog; returns * false if we decided against applying the optimization because the page * number we need to update differs from those processes already waiting. */

参考

https://www.postgresql.org/docs/11/monitoring-stats.html#WAIT-EVENT-TABLE

《为什么PostgreSQL查询语句可能产生 xlog，并且可能对buffer有write操作? - hint bits》

src/backend/access/transam/clog.c

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL 10 CLogControlLock 等待事件分析与优化 - hint bit, freeze, autovacuum, 风暴

作者

日期

标签

背景

什么情况下CLogControlLock锁等待会成为瓶颈，并严重影响性能呢？

如何模拟CLogControlLock锁等待瓶颈

事件分析

如何优化

PostgreSQL 11 内核代码层优化

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论