Whoami:5年+金融、政府、医疗领域工作经验的DBACertificate:OCP、PCPSkill:Oracle、Mysql、PostgreSQLPlatform:CSDN、墨天伦、公众号(呆呆的私房菜)
阅读本文可以了解PostgreSQL事务ID回卷、事务ID冻结等相关概念和实操内容,帮助读者更深入理解事务ID的工作机制。
PostgreSQL的MVCC机制通过保存数据行多个历史版本并通过记录上的事务 ID 元组(xmin,xmax) 来控制不同版本记录的可见性。
在PostgreSQL数据库里,事务 ID(XID)是 32 位无符号,顺序生成,单调递增。当到达最大值(2^32-1) 后又开始从 3 开始使用。这种现象一般称为事务 ID 回卷(Wraparound)。

单纯看这个机制可以看出历史事务修改的记录的可见性可能发生反转。一个记录从“历史记录”变成“未来记录”,数据就莫名其妙的“丢失”了,这个现象就是本文我们要来分析的事务ID回卷问题。
实际上,PostgreSQL针对数据库和表有冻结机制(vacuum freeze)来避免事务ID回卷带来的这种问题出现!
因此,我们有必须来了解PostgreSQL冻结的相关参数:
postgres=# select name, setting, short_desc from pg_settings where name in ('vacuum_freeze_min_age','autovacuum_freeze_max_age','vacuum_freeze_table_age');name | setting | short_desc-----------------------------+-----------+--------------------------------------------------------------------------------vacuum_freeze_min_age | 50000000 | Minimum age at which VACUUM should freeze a table row.autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound.vacuum_freeze_table_age | 150000000 | Age at which VACUUM should scan whole table to freeze tuples.(3 行记录)
1. vacuum_freeze_min_age:每个元组距离上次freeze操作后需要多少txid后才能重新freeze,默认是5000万。
2. autovacuum_freeze_max_age:txid_current - t_xmin >= autovacuum_freeze_max_age,则元组对应的表会进行autovacuum(即使关闭了autovacuum),默认为2亿。
3. vacuum_freeze_table_age:freeze过程中需要对所有可见且未被all-frozen的数据页进行扫描,这个扫描过程称为迫切冻结(aggressive vacuum)。这个参数用来决定迫切扫描的周期。
本文,我们采用pg_resetwal工具模拟事务ID推进。
pg_resetwal工具支持在系统数据损坏的时候进行重新标识日志的位置,让系统可以重启运行并且重置pg_control文件。
# pg_resetwal用法如下:pg_resetwal --helppg_resetwal resets the PostgreSQL write-ahead log.Usage:pg_resetwal [OPTION]... DATADIROptions:-c, --commit-timestamp-ids=XID,XIDset oldest and newest transactions bearingcommit timestamp (zero means no change)[-D, --pgdata=]DATADIR data directory-e, --epoch=XIDEPOCH set next transaction ID epoch-f, --force force update to be done-l, --next-wal-file=WALFILE set minimum starting location for new WAL-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID-n, --dry-run no update, just show what would be done-o, --next-oid=OID set next OID-O, --multixact-offset=OFFSET set next multitransaction offset-V, --version output version information, then exit-x, --next-transaction-id=XID set next transaction ID--wal-segsize=SIZE size of WAL segments, in megabytes-?, --help show this help, then exitReport bugs to <pgsql-bugs@lists.postgresql.org>.
本文,我们主要用到 -x 参数,用于推进事务ID,下面,我们开始实验:
1. 创建数据库和表
postgres=# create database test;postgres=# \c test;test=# create table t1 (id int, name varchar(10));test=# insert into t1 values (1,'t1'),(2,'t2'),(3,'t3'),(4,'t4'),(5,'t5');INSERT 0 5test=# update t1 set name = 't22' where id = 2;UPDATE 1test=# update t1 set name = 't33' where id = 3;UPDATE 1test=# delete from t1 where id = 4;DELETE 1test=# select ctid, xmin, xmax, cmin, cmax, id, name from t1;ctid | xmin | xmax | cmin | cmax | id | name-------+------+------+------+------+----+------(0,1) | 571 | 0 | 0 | 0 | 1 | t1(0,5) | 571 | 0 | 0 | 0 | 5 | t5(0,6) | 572 | 0 | 0 | 0 | 2 | t22(0,7) | 573 | 0 | 0 | 0 | 3 | t33(4 行记录)test=# SELECT lp, t_ctid AS ctid,test-# t_xmin AS xmin,test-# t_xmax AS xmax,test-# to_hex(t_infomask) AS infomask,test-# to_hex(t_infomask2) AS infomask2,test-# t_attrs AS attrstest-# FROM heap_page_item_attrs(get_raw_page('t1', 0), 't1')test-# LIMIT 10;lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------+------+----------+-----------+-------------------------------1 | (0,1) | 571 | 0 | 902 | 2 | {"\\x01000000","\\x077431"}2 | (0,6) | 571 | 572 | 502 | 4002 | {"\\x02000000","\\x077432"}3 | (0,7) | 571 | 573 | 502 | 4002 | {"\\x03000000","\\x077433"}4 | (0,4) | 571 | 574 | 502 | 2002 | {"\\x04000000","\\x077434"}5 | (0,5) | 571 | 0 | 902 | 2 | {"\\x05000000","\\x077435"}6 | (0,6) | 572 | 0 | 2902 | 8002 | {"\\x02000000","\\x09743232"}7 | (0,7) | 573 | 0 | 2902 | 8002 | {"\\x03000000","\\x09743333"}(7 行记录)
2. 查看事务年龄
test=# select oid, relname, relfrozenxid, age(relfrozenxid) age, txid_current() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> 0 and relname in ('t1');oid | relname | relfrozenxid | age | txid_current-------+---------+--------------+-----+--------------16413 | t1 | 570 | 5 | 575(1 行记录)test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agetest-# from pg_class c left join pg_class t on c.reltoastrelid = t.oidtest-# where c.relkind in ('r', 'm') and c.relname like '%t1%'test-# order by age desc ;table_name | age------------+-----t1 | 6(1 行记录)## 由于PG默认开启自动提交,所以每执行一个事务,age就自动加1。
3. 第一次推进事务ID,取值小于2^32
pg_ctl stoppg_resetwal -x 2097483648 -D $PGDATApg_ctl start## 启动报错如下2024-07-16 15:35:48.891 CST [4286] LOG: database system was shut down at 2024-07-16 15:35:04 CST2024-07-16 15:35:48.893 CST [4286] FATAL: could not access status of transaction 20974836482024-07-16 15:35:48.893 CST [4286] DETAIL: Could not read from file "pg_xact/07D0" at offset 81920: Success.2024-07-16 15:35:48.894 CST [4284] LOG: startup process (PID 4286) exited with exit code 12024-07-16 15:35:48.894 CST [4284] LOG: aborting startup due to startup process failure2024-07-16 15:35:48.896 CST [4284] LOG: database system is shut down## pg_xact目录下确实不存在07D0这个文件,因为我们的事务是直接修改过来的,不是靠事务推进的。## 所以我们需要手动去伪造一下这个文件,以便让pg数据库正常启动。dd if=/dev/zero of=$PGDATA/pg_xact/07D0 bs=8192 count=1pg_ctl start
4. 观察现象(表年龄、冻结事务ID情况等)
1. autovacuum生效,表的死元组全部被情况了test=# SELECT lp, t_ctid AS ctid,t_xmin AS xmin,t_xmax AS xmax,to_hex(t_infomask) AS infomask,to_hex(t_infomask2) AS infomask2,t_attrs AS attrsFROM heap_page_item_attrs(get_raw_page('t1', 0), 't1')LIMIT 10;lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------+------+----------+-----------+-------------------------------1 | (0,1) | 571 | 0 | b02 | 2 | {"\\x01000000","\\x077431"}2 | | | | | |3 | | | | | |4 | | | | | |5 | (0,5) | 571 | 0 | b02 | 2 | {"\\x05000000","\\x077435"}6 | (0,6) | 572 | 0 | 2b02 | 8002 | {"\\x02000000","\\x09743232"}7 | (0,7) | 573 | 0 | 2b02 | 8002 | {"\\x03000000","\\x09743333"}(7 行记录)2. 表的年龄推进到了50000000test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agefrom pg_class c left join pg_class t on c.reltoastrelid = t.oidwhere c.relkind in ('r', 'm') and c.relname like '%t1%'order by age desc ;table_name | age------------+----------t1 | 50000000(1 行记录)3. 表的事务冻结ID变成了当前事务IDtest=# select oid, relname, relfrozenxid, age(relfrozenxid) age, txid_current() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> 0 and relname in ('t1');oid | relname | relfrozenxid | age | txid_current-------+---------+--------------+----------+--------------16413 | t1 | 2047483648 | 50000000 | 2097483648(1 行记录)4. 从日志也可以观察到数据库启动不久后就开始做autovacuum了tail -100f $PGDATA/pg_log/postgresql-Tue.log...2024-08-20 15:52:27.057 CST [7380] LOG: automatic aggressive vacuum of table "mytestdb.public.t": index scans: 0pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozentuples: 0 removed, 7 remain, 0 are dead but not yet removable, oldest xmin: 2097483648buffer usage: 47 hits, 15 misses, 10 dirtiedavg read rate: 3.177 MB/s, avg write rate: 2.118 MB/ssystem usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s2024-08-20 15:52:27.121 CST [7380] LOG: automatic aggressive vacuum of table "mytestdb.pg_catalog.pg_statistic": index scans: 1pages: 0 removed, 17 remain, 0 skipped due to pins, 0 skipped frozentuples: 24 removed, 398 remain, 0 are dead but not yet removable, oldest xmin: 2097483648buffer usage: 29 hits, 25 misses, 12 dirtiedavg read rate: 3.659 MB/s, avg write rate: 1.756 MB/ssystem usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.05 s...
4. 继续插入数据进行测试
test=# insert into t1 values (4,'t4');INSERT 0 1test=# insert into t1 values (6,'t6');INSERT 0 1test=# select xmin, xmax, cmin, cmax, ctid from t1;xmin | xmax | cmin | cmax | ctid------------+------+------+------+-------571 | 0 | 0 | 0 | (0,1)2097483650 | 0 | 0 | 0 | (0,4)571 | 0 | 0 | 0 | (0,5)572 | 0 | 0 | 0 | (0,6)573 | 0 | 0 | 0 | (0,7)2097483651 | 0 | 0 | 0 | (0,8)(6 行记录)test=# SELECT lp, t_ctid AS ctid,test-# t_xmin AS xmin,test-# t_xmax AS xmax,test-# to_hex(t_infomask) AS infomask,test-# to_hex(t_infomask2) AS infomask2,test-# t_attrs AS attrstest-# FROM heap_page_item_attrs(get_raw_page('t1', 0), 't1')test-# LIMIT 10;lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+-------------------------------1 | (0,1) | 571 | 0 | b02 | 2 | {"\\x01000000","\\x077431"}2 | | | | | |3 | | | | | |4 | (0,4) | 2097483650 | 0 | 902 | 2 | {"\\x04000000","\\x077434"}5 | (0,5) | 571 | 0 | b02 | 2 | {"\\x05000000","\\x077435"}6 | (0,6) | 572 | 0 | 2b02 | 8002 | {"\\x02000000","\\x09743232"}7 | (0,7) | 573 | 0 | 2b02 | 8002 | {"\\x03000000","\\x09743333"}8 | (0,8) | 2097483651 | 0 | 902 | 2 | {"\\x06000000","\\x077436"}(8 行记录)## 从这里也可以看到,PG插入新数据时会复用之前删除时候保留的空间。
5. 第二次推进事务,取值为2^31
pg_ctl stoppg_resetwal -x 2147483648 -D $PGDATApg_ctl start
6. 观察现象(表年龄、冻结事务ID情况等)
1. vacuum_freeze_table_age设定为150000000,暂无达到触发vacuum freeze条件test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agetest-# from pg_class c left join pg_class t on c.reltoastrelid = t.oidtest-# where c.relkind in ('r', 'm') and c.relname like '%t1%'test-# order by age desc ;table_name | age------------+-----------t1 | 100000000(1 行记录)test=# select oid, relname, relfrozenxid, age(relfrozenxid) age, txid_current() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> 0 and relname in ('t1');oid | relname | relfrozenxid | age | txid_current-------+---------+--------------+-----------+--------------16413 | t1 | 2047483648 | 100000000 | 2147483648(1 行记录)
7. 第三次推进事务,取值为2^32
pg_ctl stoppg_resetwal -x 4092967296 -D $PGDATApg_ctl startdd if=/dev/zero of=$PGDATA/pg_xact/0F3F bs=8192 count=12pg_ctl start
8. 观察现象(表年龄、冻结事务ID情况等)
1. 表的age触发了vacuum freeze阈值,age降低到50000000test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agetest-# from pg_class c left join pg_class t on c.reltoastrelid = t.oidtest-# where c.relkind in ('r', 'm') and c.relname like '%t1%'test-# order by age desc ;table_name | age------------+----------t1 | 50000000(1 行记录)test=# select oid, relname, relfrozenxid, age(relfrozenxid) age, txid_current() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> 0 and relname in ('t1');oid | relname | relfrozenxid | age | txid_current-------+---------+--------------+----------+--------------16413 | t1 | 4042967296 | 50000000 | 4092967296(1 行记录)
9. 第四次推进事务,取值为3
pg_ctl stoppg_resetwal -x 3 -D $PGDATApg_ctl startdd if=/dev/zero of=$PGDATA/pg_xact/0000 bs=8192 count=12
10. 观察现象(表年龄、冻结事务ID情况等)
1. 表的事务ID回卷了test=# SELECT lp, t_ctid AS ctid,test-# t_xmin AS xmin,test-# t_xmax AS xmax,test-# to_hex(t_infomask) AS infomask,test-# to_hex(t_infomask2) AS infomask2,test-# t_attrs AS attrstest-# FROM heap_page_item_attrs(get_raw_page('t1', 0), 't1')test-# LIMIT 10;lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+-------------------------------1 | (0,1) | 571 | 0 | b02 | 2 | {"\\x01000000","\\x077431"}2 | | | | | |3 | | | | | |4 | (0,4) | 2097483650 | 0 | b02 | 2 | {"\\x04000000","\\x077434"}5 | (0,5) | 571 | 0 | b02 | 2 | {"\\x05000000","\\x077435"}6 | (0,6) | 572 | 0 | 2b02 | 8002 | {"\\x02000000","\\x09743232"}7 | (0,7) | 573 | 0 | 2b02 | 8002 | {"\\x03000000","\\x09743333"}8 | (0,8) | 2097483651 | 0 | b02 | 2 | {"\\x06000000","\\x077436"}(8 行记录)test=# select oid, relname, relfrozenxid, age(relfrozenxid) age, txid_current() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> 0 and relname in ('t1');oid | relname | relfrozenxid | age | txid_current-------+---------+--------------+----------+--------------16413 | t1 | 4244967299 | 50000000 | 3(1 行记录)test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agetest-# from pg_class c left join pg_class t on c.reltoastrelid = t.oidtest-# where c.relkind in ('r', 'm') and c.relname like '%t1%'test-# order by age desc ;table_name | age------------+----------t1 | 50000001(1 行记录)test=# select ctid, xmin, xmax, cmin, cmax, id, name from t1;ctid | xmin | xmax | cmin | cmax | id | name--------+------------+------+------+------+----+------(0,1) | 571 | 0 | 0 | 0 | 1 | t1(0,5) | 571 | 0 | 0 | 0 | 5 | t5(0,6) | 572 | 0 | 0 | 0 | 2 | t22(0,7) | 573 | 0 | 0 | 0 | 3 | t33(0,8) | 2097483651 | 0 | 0 | 0 | 6 | t6(0,9) | 4 | 0 | 0 | 0 | 7 | t7(0,10) | 5 | 0 | 0 | 0 | 4 | t44(7 行记录)2. t1表可以正常使用,正常插入数据test=# insert into t1 values (7,'t7');INSERT 0 1test=# update t1 set name = 't44' where id = 4;UPDATE 1test=# SELECT lp, t_ctid AS ctid,test-# t_xmin AS xmin,test-# t_xmax AS xmax,test-# to_hex(t_infomask) AS infomask,test-# to_hex(t_infomask2) AS infomask2,test-# t_attrs AS attrstest-# FROM heap_page_item_attrs(get_raw_page('t1', 0), 't1')test-# LIMIT 10;lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+--------+------------+------+----------+-----------+-------------------------------1 | (0,1) | 571 | 0 | b02 | 2 | {"\\x01000000","\\x077431"}2 | | | | | |3 | | | | | |4 | (0,10) | 2097483650 | 5 | 702 | 4002 | {"\\x04000000","\\x077434"}5 | (0,5) | 571 | 0 | b02 | 2 | {"\\x05000000","\\x077435"}6 | (0,6) | 572 | 0 | 2b02 | 8002 | {"\\x02000000","\\x09743232"}7 | (0,7) | 573 | 0 | 2b02 | 8002 | {"\\x03000000","\\x09743333"}8 | (0,8) | 2097483651 | 0 | b02 | 2 | {"\\x06000000","\\x077436"}9 | (0,9) | 4 | 0 | 902 | 2 | {"\\x07000000","\\x077437"}10 | (0,10) | 5 | 0 | 2902 | 8002 | {"\\x04000000","\\x09743434"}(10 行记录)
结论:PostgreSQL的事务ID随着业务读写不断推进,同时PostgreSQL内部的vacuum机制也会在适当的时候触发旧事务进行冻结回收。
一般情况下不会出现事务冻结失败的情况,但是事实上在客户环境确实也遇到过这种问题,提示报错如下:
ERROR: database is not accepting commands to avoid wraparound data loss in database "szdb"HINT: Stop the postmaster and vacuum that database in single-user mode.
那么,冻结失败一般是由于短事务的TPS和只读查询的QPS非常高,事务ID消耗非常快,同时表的数据量非常大,vacuum速度非常慢,并且由于某些原因报错中断了,可能的原因如下
1. 日志文件损坏了;
2. 长事务一直锁表不提交;
3. 二阶段提交中的未决prepare transaction
下面我们通过模拟二阶段提交的未决事务引发事务冻结失败的场景:
1. 查看当前事务IDtest=# begin;BEGINtest=# select txid_current();txid_current--------------9(1 行记录)2. 开启二阶段提交事务test=# lock t1 in share update exclusive mode;LOCK TABLEtest=# PREPARE TRANSACTION 't1';PREPARE TRANSACTION3. 推进事务小于2^31pg_ctl stoppg_resetwal -x 2137483648 -D $PGDATAdd if=/dev/zero of=$PGDATA/pg_xact/07F6 bs=8192 count=15pg_ctl start## 启动后数据库持续报错如下,提示我们要关闭打开的未决事务2024-07-16 21:46:28.525 CST [9787] HINT: Close open transactions soon to avoid wraparound problems.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
注意:
当数据库事务ID剩余量小于1000万时,只能进入单用户模式进行操作;
当数据库事务ID剩余量大于1000万时,可以进入数据库手工进行vacuum操作。
1. 查看表的年龄,负数表明存在问题test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agefrom pg_class c left join pg_class t on c.reltoastrelid = t.oidwhere c.relkind in ('r', 'm') and c.relname like '%t1%'order by age desc ;table_name | age------------+-------------t1 | -2107483651(1 行记录)2. 查看事务ID老化情况test=# WITH max_age AS (test(# SELECT 2000000000 as max_old_xidtest(# , setting AS autovacuum_freeze_max_agetest(# FROM pg_catalog.pg_settingstest(# WHERE name = 'autovacuum_freeze_max_age' )test-# , per_database_stats AS (test(# SELECT datnametest(# , m.max_old_xid::inttest(# , m.autovacuum_freeze_max_age::inttest(# , age(d.datfrozenxid) AS oldest_current_xidtest(# FROM pg_catalog.pg_database dtest(# JOIN max_age m ON (true)test(# WHERE d.datallowconn )test-# select max(oldest_current_xid) AS oldest_current_xidtest-# , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparoundtest-# , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovactest-# FROM per_database_statstest-# ;oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac--------------------+----------------------------+-----------------------------------2137483645 | 107 | 1069(1 行记录)3. 我们尝试插入数据,发现无法插入test=# insert into t1 values(8,'t8');ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"提示: Stop the postmaster and vacuum that database in single-user mode.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.4. 手工执行vacuum freeze t1表,此时会话hung住test=# vacuum freeze t1;5. 新开一个会话,查看数据库阻塞情况postgres=# SELECT blocked_locks.pid AS blocked_pid,postgres-# blocked_activity.usename AS blocked_user,postgres-# blocking_locks.pid AS blocking_pid,postgres-# blocking_activity.usename AS blocking_user,postgres-# blocked_activity.query AS blocked_statement,postgres-# blocking_activity.query AS current_statement_in_blocking_processpostgres-# FROM pg_catalog.pg_locks blocked_lockspostgres-# JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pidpostgres-# JOIN pg_catalog.pg_locks blocking_lockspostgres-# ON blocking_locks.locktype = blocked_locks.locktypepostgres-# AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.databasepostgres-# AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relationpostgres-# AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.pagepostgres-# AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuplepostgres-# AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxidpostgres-# AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionidpostgres-# AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classidpostgres-# AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objidpostgres-# AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubidpostgres-# AND blocking_locks.pid != blocked_locks.pidpostgres-# JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pidpostgres-# WHERE NOT blocked_locks.granted;blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process-------------+--------------+--------------+---------------+------------------------------------------------------+------------------------------------------------------15881 | | 15879 | postgres | autovacuum: VACUUM public.t1 (to prevent wraparound) | vacuum freeze t1;15879 | postgres | 15881 | | vacuum freeze t1; | autovacuum: VACUUM public.t1 (to prevent wraparound)(2 行记录)postgres=# select locktype, database , virtualtransaction , pid, mode from pg_locks where database = (select oid from pg_database where datname = 'test') and mode like '%Exclusive%';locktype | database | virtualtransaction | pid | mode----------+----------+--------------------+-------+--------------------------relation | 16384 | -1/9 | | ShareUpdateExclusiveLockrelation | 16384 | 3/13 | 15879 | ShareUpdateExclusiveLockrelation | 16384 | 4/3 | 15881 | ShareUpdateExclusiveLock(3 行记录)结论:发现二阶段提交事务无法直接在锁视图中定位到!!6. 直接查看二阶段提交事务视图postgres=# select * from pg_prepared_xacts;transaction | gid | prepared | owner | database-------------+-----+-------------------------------+----------+----------9 | t1 | 2024-07-16 20:58:56.399914+08 | postgres | test(1 行记录)# 操作系统上检查ls -ltr $PGDATA/pg_twophase7. 回滚二阶段提交事务test=# rollback prepared 't1';ROLLBACK PREPARED## 观察数据库日志,此时数据库会自动vacuum2024-07-16 22:40:08.913 CST [15879] WARNING: database "test" must be vacuumed within 10000000 transactions2024-07-16 22:40:08.913 CST [15879] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.2024-07-16 22:40:10.043 CST [15881] WARNING: database "postgres" must be vacuumed within 10000000 transactions2024-07-16 22:40:10.043 CST [15881] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.2024-07-16 22:40:14.545 CST [21394] WARNING: database "template1" must be vacuumed within 10000000 transactions2024-07-16 22:40:14.545 CST [21394] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.2024-07-16 22:40:29.540 CST [21434] WARNING: database "template0" must be vacuumed within 10000000 transactions2024-07-16 22:40:29.540 CST [21434] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.8. 查看数据库age,此时已经降到5000万test=# WITH max_age AS (SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_ageFROM pg_catalog.pg_settingsWHERE name = 'autovacuum_freeze_max_age' ), per_database_stats AS (SELECT datname, m.max_old_xid::int, m.autovacuum_freeze_max_age::int, age(d.datfrozenxid) AS oldest_current_xidFROM pg_catalog.pg_database dJOIN max_age m ON (true)WHERE d.datallowconn )select max(oldest_current_xid) AS oldest_current_xid, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovacFROM per_database_stats;oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac--------------------+----------------------------+-----------------------------------50000000 | 2 | 25(1 行记录)9. 查看表的年龄,已经置0test=# SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agefrom pg_class c left join pg_class t on c.reltoastrelid = t.oidwhere c.relkind in ('r', 'm') and c.relname like '%t1%'order by age desc ;table_name | age------------+-----t1 | 0(1 行记录)
结论:
1. 关注和监控数据库中的长事务,避免表膨胀和事务冻结失败的问题产生;
2. 谨慎使用二阶段提交,如果使用了一定记得提交或回滚事务。
本文内容就到这啦,阅读完本篇,相信你对PostgreSQL的事务回卷和冻结的相关知识有了一定的认识了吧!我们下篇再见!





