前端时间某客户生产环境PG数据库出现性能波动问题,多次出现数据库响应超时的异常。异常时间段查询pg_stat_activity视图发现大量进程在等待SubtransControlLock事件。参考官网解释Waiting to read or update sub-transaction information.是指在等待读取或更新子事务的信息。

什么是子事务?
子事务也被称为嵌套事务(nested transaction),是在一个已经启动的事务中启动的事务。使用子事务可以做部分回滚,这在PG中尤其有用。我们知道在PG中任何一个错误会回滚整个事务,而不是像Oracle或PG那样只回滚报错的statement。在某些场景下为了达到和Oracle、MySQL同样的效果,就需要使用子事务。
SQL 标准定义savepoint创建子事务,ROLLBACK TO SAVEPOINT回滚子事务,RELEASE SAVEPOINT清除子事务。
<savepoint statement> ::=
SAVEPOINT <savepoint specifier>
<savepoint specifier> ::=
<savepoint name>
<rollback statement> ::=
ROLLBACK [ WORK ] [ AND [ NO ] CHAIN ] [ <savepoint clause> ]
<savepoint clause> ::=
TO SAVEPOINT <savepoint specifier>
<release savepoint statement> ::=
RELEASE SAVEPOINT <savepoint specifier>
在PostgreSQL JDBC中还可以设置autosave参数,当配置成always时JDBC driver会在每一个statement前设置一个savepoint,如果该语句报错则回滚到这个savepoint。该参数默认为never,即不会设置savepoint。如果设置成conservative 保守模式,则会在每一个statement前设置一个savepoint,但是只会在极少数情况下回滚,例如 ‘cached statement cannot change return type’ or ‘statement XXX is not valid’ 时JDBC driver会回滚并重试。
SubtransControlLock是在等什么?
SubtransControlLock是在等待读取或更新子事务的信息。在PG中元组(tuple)的可见性是由它的系统列xmin 和 xmax决定的。因为子事务也会分配XID,如果元组存储的XID属于一个子事务,那就需要找到其对应的顶级(父)事务的XID,来判断数据是否可见,因为只有顶级事务才能真正提交。子事务和它的父事务的对应关系是存放在磁盘上的pg_subtrans 目录中。
PG在每个语句开始的时候会对数据库做一个快照,用来确定哪些元组是可见的。快照包括:
- 当前最大事务ID,此XID以后的数据不可见
- 当前活跃的事务和子事务列表
- 当前事务/子事务的最早可见命令的编号
快照会记录所有的活跃顶级事务XID,以及尝试在有限的空间中记录子事务XID。如果有的session有超过64个活动的子事务,则这个快照的子事务数据会被标记为overflowed。如果suboverflowed = true就需要通过映射关系去找子事务和对应的顶级事务。
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
* for non-aborted subtransactions of its current top transaction. These
* have to be treated as running XIDs by other backends.
*
* We also keep track of whether the cache overflowed (ie, the transaction has
* generated at least one subtransaction that didn't fit in the cache).
* If none of the caches have overflowed, we can assume that an XID that's not
* listed anywhere in the PGPROC array is not a running transaction. Else we
* have to look at pg_subtrans.
*/
#define PGPROC_MAX_CACHED_SUBXIDS 64 /* XXX guessed-at value */
if (nsubxacts > PGPROC_MAX_CACHED_SUBXIDS)
{
pgxact->overflowed = true;
nsubxacts = PGPROC_MAX_CACHED_SUBXIDS;
}
if (pgxact->overflowed)
suboverflowed = true;
在suboverflowed的情况下,要获取完整的可见性数据只能从pg_subtrans获取。这种情况下通过 perf top命令可以发现SimpleLruReadPage_ReadOnly命令的开销增加。SimpleLruReadPage_ReadOnly可能会触发读取pg_subtrans,而其他事务注册子事务时会去更新pg_subtrans,这可能会导致锁竞争。
/*
* Interrogate the parent of a transaction in the subtrans log.
*/
TransactionId
SubTransGetParent(TransactionId xid)
{
int pageno = TransactionIdToPage(xid);
int entryno = TransactionIdToEntry(xid);
int slotno;
TransactionId *ptr;
TransactionId parent;
/* Can't ask about stuff that might not be around anymore */
Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
/* Bootstrap and frozen XIDs have no parent */
if (!TransactionIdIsNormal(xid))
return InvalidTransactionId;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
parent = *ptr;
LWLockRelease(SubtransControlLock);
return parent;
}
#define NUM_SUBTRANS_BUFFERS 32
/*
* Initialization of shared memory for SUBTRANS
*/
Size
SUBTRANSShmemSize(void)
{
return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
}
void
SUBTRANSShmemInit(void)
{
SubTransCtl->PagePrecedes = SubTransPagePrecedes;
SimpleLruInit(SubTransCtl, "subtrans", NUM_SUBTRANS_BUFFERS, 0,
SubtransControlLock, "pg_subtrans",
LWTRANCHE_SUBTRANS_BUFFERS);
/* Override default assumption that writes should be fsync'd */
SubTransCtl->do_fsync = false;
}
void
SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
LWLock *ctllock, const char *subdir, int tranche_id)
{
shared->num_slots = nslots;
/*
* Find a page in a shared buffer, reading it in if necessary.
* The page number must correspond to an already-initialized page.
* The caller must intend only read-only access to the page.
*
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
* Return value is the shared-buffer slot number now holding the page.
* The buffer's LRU access info is updated.
*
* Control lock must NOT be held at entry, but will be held at exit.
* It is unspecified whether the lock will be shared or exclusive.
*/
int
SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
{
SlruShared shared = ctl->shared;
int slotno;
/* Try to find the page while holding only shared lock */
LWLockAcquire(shared->ControlLock, LW_SHARED);
/* See if page is already in a buffer */
for (slotno = 0; slotno < shared->num_slots; slotno++)
{
if (shared->page_number[slotno] == pageno &&
shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
{
/* See comments for SlruRecentlyUsed macro */
SlruRecentlyUsed(shared, slotno);
return slotno;
}
}
/* No luck, so switch to normal exclusive lock and do regular read */
LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
return SimpleLruReadPage(ctl, pageno, true, xid);
}
可以看到PG为了避免直接读取磁盘上pg_subtrans的数据,会先线性查找SLRU缓存。
PG查询子事务ID时,会计算该 ID 在哪个内存页中,然后进行线性搜索以在内存页中查找。如果页面不在缓存中,它会逐出一页并将所需的页面加载到内存中。
SLRU 默认32个page,每个page 默认8k,每个事务ID占4个字节,理论上SLRU最多能缓存65K 事务id。 但大多数时候每页不会存满,极端条件下分配给子事务的单个 XID 可能单独占用整个 SLRU 页面。
在TPS较高且大量使用子事务的系统中,SLRU很快会被写满,导致大量磁盘IO。
这就是SubtransControlLock大量出现的原因。
#define NUM_SUBTRANS_BUFFERS 32
#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
void
ExtendSUBTRANS(TransactionId newestXact)
{
int pageno;
/*
* No work except at first XID of a page. But beware: just after
* wraparound, the first XID of page zero is FirstNormalTransactionId.
*/
if (TransactionIdToEntry(newestXact) != 0 &&
!TransactionIdEquals(newestXact, FirstNormalTransactionId))
return;
pageno = TransactionIdToPage(newestXact);
LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
/* Zero the page */
ZeroSUBTRANSPage(pageno);
LWLockRelease(SubtransControlLock);
}
增加PGPROC_MAX_CACHED_SUBXIDS的值理论上可以减少suboverflowed的出现,但会增加内存开销。
增加NUM_SUBTRANS_BUFFERS的值理论上可以缓解该问题的出现,考虑SLRU 缓存对所需页面进行线性搜索如果盲目增加缓存大小,额外增加的搜索成本可能会导致性能的下降。
如何判断是子事务溢出导致的性能问题
- perf top 发现SimpleLruReadPage_ReadOnly函数开销增加
- pg_stat_activity视图中大量出现等待SubtransControlLock事件
- pg_export_snapshot()导出的快照信息,pg_snapshots 子目录中的文件包含
sof:1 - 并发增加后 系统负载明显增加。
PG测试
pgbench构造测试数据
-bash-4.2$ /usr/pgsql-12/bin/pgbench -i test
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data...
100000 of 100000 tuples (100%) done (elapsed 0.02 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
两个SQL文件,查询和创建大量子事务
echo "\\set id random(1, 10000)
begin;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s1;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s2;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s3;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s4;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s5;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s6;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s7;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s8;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s9;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s10;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s11;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s12;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s13;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s14;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s15;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s16;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s17;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s18;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s19;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s20;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s21;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s22;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s23;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s24;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s25;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s26;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s27;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s28;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s29;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s30;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s31;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s32;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s33;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s34;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s35;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s36;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s37;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s38;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s39;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s40;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s41;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s42;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s43;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s44;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s45;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s46;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s47;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s48;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s49;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s50;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s51;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s52;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s53;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s54;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s55;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s56;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s57;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s58;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s59;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s60;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s61;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s62;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s63;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s64;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s65;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s66;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s67;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s68;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s69;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s70;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s71;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s72;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s73;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s74;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s75;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s76;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s77;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s78;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s79;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s80;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s81;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s82;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s83;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s84;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s85;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s86;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s87;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s88;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s89;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s90;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s91;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s92;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s93;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s94;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s95;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s96;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s97;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s98;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s99;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s100;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s101;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s102;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s103;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s104;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s105;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s106;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s107;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s108;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s109;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s110;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s111;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s112;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s113;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s114;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s115;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s116;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s117;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s118;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s119;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s120;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s121;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s122;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s123;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s124;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s125;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s126;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s127;
update pgbench_accounts set aid = aid where aid = :id;
savepoint s128;
update pgbench_accounts set aid = aid where aid = :id;
commit;" > updates_with_savepoints.sql
echo "\\set id random(1, 10000)
select * from pgbench_accounts where aid = :id;" > selects.sql
启动pgbench测试,同时开启一个长事务
-bash-4.2$ /usr/pgsql-12/bin/pgbench -rn -P2 -T180 -c25 -j25 -f updates_with_savepoints.sql -M prepared test
-bash-4.2$ /usr/pgsql-12/bin/pgbench -rn -P2 -T180 -c25 -j25 -f selects.sql -M prepared test
postgres=# select txid_current(), pg_sleep_for('5 minutes');
视图出现大量SubtransControlLock等待事件
postgres=# select wait_event_type,wait_event,state,query from pg_stat_activity where state='active';
wait_event_type | wait_event | state | query
-----------------+---------------------+--------+--------------------------------------------------------------------------------------------
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | subtrans | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | subtrans | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
| | active | select wait_event_type,wait_event,state,query from pg_stat_activity where state='active';
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | subtrans | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
| | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | subtrans | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
LWLock | SubtransControlLock | active | update pgbench_accounts set aid = aid where aid = $1;
Timeout | PgSleep | active | select txid_current(), pg_sleep_for('5 minutes');
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | subtrans | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
LWLock | SubtransControlLock | active | select * from pgbench_accounts where aid = $1;
Perf top 发现SimpleLruReadPage_ReadOnly函数开销增加
[root@mogdb ~]# perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres
+ 8.57% [.] LWLockRelease
6.57% [.] 0x0000000000353fb6
5.91% [.] SimpleLruReadPage_ReadOnly
2.83% [.] LWLockAcquire
+ 1.72% [.] TransactionIdIsCurrentTransactionId
+ 1.39% [.] hash_search_with_hash_value
1.33% [.] 0x0000000000353f93
1.16% [.] SubTransGetParent
+ 0.66% [.] heap_hot_search_buffer
0.57% [.] SubTransGetTopmostTransaction
+ 0.52% [.] PostgresMain




