我们知道默认情况下表在条件达到autovacuum_vacuum_scale_factor和autovacuum_vacuum_threshold的限制时会触发autovacuum,比如基于1000W的表,需要有20W以上数据进行更改时才触发autovacuum,但是同时对于有锁的事务比如长事务、大量频繁的更新操作,autovacuum会跳过,由于时间限制不能完成表清理。
那么我们怎么判断是哪些原因导致表不能清理呢?我们做一个测试说明长事务导致表不能清理的情况:
为了方便测试我们把所有autovacuum执行过程写入日志。postgres=# show log_autovacuum_min_duration ;log_autovacuum_min_duration------------------------------1(1 row)postgres=# alter system set log_autovacuum_min_duration to 0;ALTER SYSTEMpostgres=# select pg_reload_conf();pg_reload_conf----------------t(1 row)postgres=# show log_autovacuum_min_duration ;log_autovacuum_min_duration-----------------------------0(1 row)postgres=#
创建测试表和测试数据,为了方便测试我把表tbl_test_autovacuum的autovacuum触发条件改为autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=1postgres=# create table tbl_test_autovacuum(id int ,name varchar);CREATE TABLEpostgres=# alter table tbl_test_autovacuum set (autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=1);ALTER TABLEpostgres=# insert into tbl_test_autovacuum values (1,'hl'),(2,'hl'),(3,'hl'),(4,'hl'),(5,'hl');INSERT 0 5postgres=#查看数据postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples---------------------+----------+-----------+-------------------------+-------------------------tbl_test_autovacuum | 1 | 5 | 5 | 0(1 row)postgres=#
session1 开启一个事务,更新一条数据,不提交
postgres=# begin;BEGINpostgres=*# update tbl_test_autovacuum set name='test' where id=1;UPDATE 1postgres=*#
session2 删除部分数据触发autovacuum
postgres=# delete from tbl_test_autovacuum where id > 2;DELETE 3postgres=#
再次查看数据分布情况,产生了三个死元组postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples---------------------+----------+-----------+-------------------------+-------------------------tbl_test_autovacuum | 1 | 5 | 2 | 3(1 row)postgres=#
查看数据库日志
autovacuum 触发了,但是有些行无法移除,在查询中说明涉及到的行是会被保护的,提示oldest xmin: 1703。该 xmin 是造成不能 vacuum 的事务 id,根据提示信息,有尚未提交的事务,导致autovacuum 进程不能 vacuum。2022-06-24 02:30:42.575 EDT [9425] LOG: automatic vacuum of table "postgres.public.tbl_test_autovacuum": index scans: 0pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozentuples: 0 removed, 5 remain, 3 are dead but not yet removable, oldest xmin: 1703index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removedavg read rate: 0.000 MB/s, avg write rate: 0.000 MB/sbuffer usage: 32 hits, 0 misses, 0 dirtiedWAL usage: 0 records, 0 full page images, 0 bytessystem usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
查看1703是什么事务,通过pg_stat_activity视图可以看到就是session1执行的未提交事务。
postgres=# \xExpanded display is on.postgres=# select * from pg_stat_activity where backend_xid=1703;-[ RECORD 1 ]----+-------------------------------------------------------datid | 13892datname | postgrespid | 8926leader_pid |usesysid | 10usename | atlasdbapplication_name | psqlclient_addr |client_hostname |client_port | -1backend_start | 2022-06-24 02:22:43.805632-04xact_start | 2022-06-24 02:27:55.184073-04query_start | 2022-06-24 02:27:57.200055-04state_change | 2022-06-24 02:27:57.200932-04wait_event_type | Clientwait_event | ClientReadstate | idle in transactionbackend_xid | 1703backend_xmin |query_id | 3783348657976902404query | update tbl_test_autovacuum set name='test' where id=1;backend_type | client backendpostgres=#
此时提交session1中的事务
postgres=# begin;BEGINpostgres=*# update tbl_test_autovacuum set name='test' where id=1;UPDATE 1postgres=*# end;COMMITpostgres=查看数据元组情况,由于autovacuum_naptime参数设置不能立刻触发autovacuumpostgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples---------------------+----------+-----------+-------------------------+-------------------------tbl_test_autovacuum | 1 | 2 | 2 | 4(1 row)postgres=#
继续观察数据库日志,可以看到死元组被清理了。
2022-06-24 02:38:42.677 EDT [9841] LOG: automatic vacuum of table "postgres.public.tbl_test_autovacuum": index scans: 0pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozentuples: 4 removed, 2 remain, 0 are dead but not yet removable, oldest xmin: 1705index scan not needed: 1 pages from table (100.00% of total) had 3 dead item identifiers removedavg read rate: 0.000 MB/s, avg write rate: 0.000 MB/sbuffer usage: 32 hits, 0 misses, 0 dirtiedWAL usage: 2 records, 0 full page images, 118 bytessystem usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
再次查看数据分布情况,已经没有死元组了
postgres=# select relname,relpages,reltuples,pg_stat_get_live_tuples(oid),pg_stat_get_dead_tuples(oid) from pg_class where relname='tbl_test_autovacuum';relname | relpages | reltuples | pg_stat_get_live_tuples | pg_stat_get_dead_tuples---------------------+----------+-----------+-------------------------+-------------------------tbl_test_autovacuum | 1 | 2 | 2 | 0(1 row)postgres=#
总结:
1. 在存在长事务的时候,触发autovacuum后死元组是不被清理的;
2. 当长事务提交,但是条件不够触发autovacuum时,可以使用vacuum手动进行数据清理;
3. 如果使用vacuum清理的时候如果长事务未提交,也是不能清理死元组的。
文章转载自PostgreSQL数据库工作学习随笔,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




