前言
今天中午在网上冲浪的时候,偶然发现了两个关于子事务的扩展,立马来了兴趣,关于子事务的篇章我写了很多,其危害数不胜数,但是对于子事务的可观测性,PG 又一直做得很烂,甚至都没有一个像样的视图、表或者函数用于观测子事务,一个典型场景就是 OGG 同步到 PG,那么会在 PG 这一侧开启大量子事务,对于性能冲击很大,每次碰到这类问题,除了观测可能的等待事件 SubtransSLRU 、SubtransControlLock 之外,别无他法。所以,当我看到这两个插件时,立马眼前一亮。让我们小试牛刀一下!
pg_subtrans_infos
第一个扩展是 pg_subtrans_infos,允许观测子事务的状态 (另一款好用的 AWR 工具 pgsentinel 也是出自同一位作者)。
Allow to get subtransaction information thanks to a pg_subtrans_infos function.
让我们看个例子:
postgres=# begin;
BEGIN
postgres=*# insert into test values(1,'hello');
INSERT 0 1
postgres=*# savepoint a;
SAVEPOINT
postgres=*# insert into test values(2,'world');
INSERT 0 1
postgres=*# savepoint b;
SAVEPOINT
postgres=*# insert into test values(3,'xiongcc');
INSERT 0 1
postgres=*# select xmin,* from test;
xmin | id | info
------+----+---------
866 | 1 | hello
867 | 2 | world
868 | 3 | xiongcc
(3 rows)
新开一个会话观察:
postgres=# select * from pg_subtrans_infos(866);
xid | status | parent_xid | top_parent_xid | sub_level | commit_timestamp
-----+-------------+------------+----------------+-----------+------------------
866 | in progress | | | |
(1 row)
postgres=# select * from pg_subtrans_infos(867);
xid | status | parent_xid | top_parent_xid | sub_level | commit_timestamp
-----+-------------+------------+----------------+-----------+------------------
867 | in progress | 866 | 866 | 1 |
(1 row)
postgres=# select * from pg_subtrans_infos(868);
xid | status | parent_xid | top_parent_xid | sub_level | commit_timestamp
-----+-------------+------------+----------------+-----------+------------------
868 | in progress | 867 | 866 | 2 |
(1 row)
866 是最顶层父事务,当出现了 overflow ,就需要遍历 pg_subtrans SLRU 中的子事务树,以获取顶层事务 ID,访问 SLRU 的过程中就会竞争 LWLock,如果更糟的话,SLRU 中找不到就会去磁盘上找,如果再出现长事务,那酸爽。
可以看到,pg_subtrans_infos 会将最底层父事务找出来,我们就可以找到该事务的直接元凶。其次,可以通过 SQL 更好地观察子事务了,以官方提供的 SQL 为例
postgres=# select
pid,
locktype,
mode,
(select xid from pg_subtrans_infos(pgl.transactionid::text::bigint)),
(select status from pg_subtrans_infos(pgl.transactionid::text::bigint)) as "xid status",
(select parent_xid from pg_subtrans_infos(pgl.transactionid::text::bigint)),
(select top_parent_xid from pg_subtrans_infos(pgl.transactionid::text::bigint)),
(select sub_level from pg_subtrans_infos(pgl.transactionid::text::bigint)) sub_level,
(select commit_timestamp from pg_subtrans_infos(pgl.transactionid::text::bigint))
from
(select * from pg_locks where transactionid is not null) pgl
order by 4;
pid | locktype | mode | xid | xid status | parent_xid | top_parent_xid | sub_level | commit_timestamp
-------+---------------+---------------+-----+-------------+------------+----------------+-----------+------------------
26868 | transactionid | ExclusiveLock | 866 | in progress | | | |
26868 | transactionid | ExclusiveLock | 867 | in progress | 866 | 866 | 1 |
26868 | transactionid | ExclusiveLock | 868 | in progress | 867 | 866 | 2 |
(3 rows)
一目了然,父事务、子事务嵌套深度等等,可以大大方便我们排查子事务的性能问题。另外,我看官方例子支持查看历史事务的状态,但是测试下来始终会报错:ERROR: transaction ID 3254 is in the future
另外好像缺了 overflow 的统计数据?莫急,pg_subxact_counters 扩展来帮忙。
pg_subxact_counters
此扩展用于获取全局子事务计数,记录实例中所有数据库的活动,比如子事务增长速度,是否 overflow (即单个会话超过 64 个) 等等。
Get global subtransactions counters.
The purpose of this extension is to provide counters to monitor the subtransactions (generation rate, overflow, state).
该扩展提供了四个函数
subxact_start: number of substransactions that started subxact_commit: number of substransactions that committed subxact_abort: number of substransactions that aborted (rolled back) subxact_overflow: number of times a top level XID have had substransactions overflowed
subxact_overflow 指标至关重要。还是以官方提供样例为例 (尴尬的是,我本地测了一下,发现会 segment fault,目前已经提了 issue) 虽然目前版本还有 BUG,毕竟刚刚 PR 几个月,但是还是值得让各位知晓此扩展。
postgres=# \dx+ pg_subxact_counters
Objects in extension "pg_subxact_counters"
Object description
--------------------------------
function pg_subxact_counters()
view pg_subxact_counters
(2 rows)
postgres=# select * from pg_subxact_counters;
subxact_start | subxact_commit | subxact_abort | subxact_overflow
---------------+----------------+---------------+------------------
0 | 0 | 0 | 0
(1 row)
postgres=# begin;
BEGIN
postgres=*# savepoint a;
SAVEPOINT
postgres=*# savepoint b;
SAVEPOINT
postgres=*# commit;
COMMIT
postgres=# select * from pg_subxact_counters;
subxact_start | subxact_commit | subxact_abort | subxact_overflow
---------------+----------------+---------------+------------------
2 | 2 | 0 | 0
(1 row)
小结
这两个扩展虽然目前版本还有些瑕疵,Anyway,能让我们在碰到子事务相关问题的时候,会更加方便,有据可依。相信随着迭代,这两个扩展会更加成熟稳定。
参考
https://github.com/bdrouvot/pg_subxact_counters
https://github.com/bdrouvot/pg_subtrans_infos




