暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

记一次PG备库长事务影响主库vacuum案例

原创 不想值班啊 2024-09-20
133

以下生产案例,将表名等都做了替换:

业务反应sql执行慢,查看执行计划为全表扫描,count表行数很少,表膨胀比较大所以比较慢

查看表自动vacuum已经做了,但是还是有很多死元组

testdb=# select * from pg_stat_all_tables where relname='test';
-[ RECORD 1 ]-------+------------------------------
relid               | 23620
schemaname          | sch1
relname             | test
seq_scan            | 69510137
seq_tup_read        | 8810306691
idx_scan            | 3883729
idx_tup_fetch       | 3883855
n_tup_ins           | 1
n_tup_upd           | 36157173
n_tup_del           | 0
n_tup_hot_upd       | 3375745
n_live_tup          | 127
n_dead_tup          | 3298921
n_mod_since_analyze | 0
last_vacuum         | 
last_autovacuum     | 2024-09-03 15:16:36.754106+08
last_analyze        | 
last_autoanalyze    | 2024-09-03 15:15:12.431527+08
vacuum_count        | 0
autovacuum_count    | 141814
analyze_count       | 0
autoanalyze_count   | 131518

手动vacuum也未能清理死元组

testdb=# vacuum verbose sch1.test;
INFO:  vacuuming "sch1.test"
INFO:  "test": found 0 removable, 3299567 nonremovable row versions in 101152 out of 101155 pages
DETAIL:  3299177 dead row versions cannot be removed yet, oldest xmin: 2539223913
There were 6095 unused item identifiers.
Skipped 3 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.58 s, system: 0.00 s, elapsed: 0.93 s.
VACUUM

检查是否是长事务引起,发现备库有几个idle in transaction状态会话已经持续了21天

postgres=# select usename,pid,client_addr,state,wait_event,now()::timestamp(0)-query_start::timestamp(0) exec_time,SUBSTR(query,1,80) as query from pg_stat_activity where state <> 'idle' and pid <> pg_backend_pid() order by query_start;
  usename  |  pid  |  client_addr  |        state        | wait_event |    exec_time     |                                      query                                       
-----------+-------+---------------+---------------------+------------+------------------+----------------------------------------------------------------------------------
 xx_userxx | 28411 | xxx.xxx.xx.xx | idle in transaction | ClientRead | 21 days 04:12:46 | /* dynamic native SQL query */ select t.xxxxxxxx from xxxxxxxx t where t.xxxxxx 
 xx_userxx | 18125 | xxx.xxx.xx.xx | idle in transaction | ClientRead | 21 days 04:03:32 | /* dynamic native SQL query */ select t.xxxxxxxx from xxxxxxxx t where t.xxxxxx 
 xx_userxx | 33239 | xxx.xxx.xx.xx | idle in transaction | ClientRead | 21 days 04:01:26 | /* dynamic native SQL query */ select t.xxxxxxxx from xxxxxxxx t where t.xxxxxx 
 xx_userxx | 25032 | xxx.xxx.xx.xx | idle in transaction | ClientRead | 20 days 23:53:40 | /* dynamic native SQL query */ select t.xxxxxxxx from xxxxxxxx t where t.xxxxxx 
(4 rows)

把上面查出来的几个会话杀掉,再次手动执行vacuum,可以看到死元组被清理了

testdb=# vacuum verbose sch1.test;
INFO:  vacuuming "sch1.test"
INFO:  scanned index "pk_test" to remove 65699 row versions
DETAIL:  CPU: user: 0.45 s, system: 0.01 s, elapsed: 0.76 s
INFO:  "test": removed 65699 row versions in 2145 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "pk_test" now contains 2742283 row versions in 19284 pages
DETAIL:  65699 index row versions were removed.
430 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 89 removable, 3229345 nonremovable row versions in 101164 out of 101165 pages
DETAIL:  3228979 dead row versions cannot be removed yet, oldest xmin: 2539442736
There were 10985 unused item identifiers.
Skipped 1 page due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 1.08 s, system: 0.02 s, elapsed: 1.58 s.
VACUUM

testdb=# vacuum verbose sch1.test;
INFO:  vacuuming "sch1.test"
INFO:  scanned index "pk_test" to remove 2518051 row versions
DETAIL:  CPU: user: 1.78 s, system: 0.02 s, elapsed: 1.82 s
INFO:  "test": removed 2518051 row versions in 90577 pages
DETAIL:  CPU: user: 0.28 s, system: 0.00 s, elapsed: 0.29 s
INFO:  index "pk_test" now contains 224239 row versions in 19284 pages
DETAIL:  2518051 index row versions were removed.
12739 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 7 removable, 715 nonremovable row versions in 90600 out of 101165 pages
DETAIL:  324 dead row versions cannot be removed yet, oldest xmin: 2566902560
There were 437287 unused item identifiers.
Skipped 8426 pages due to buffer pins, 2139 frozen pages.
0 pages are entirely empty.
CPU: user: 2.22 s, system: 0.05 s, elapsed: 2.30 s.
VACUUM

如何避免这个问题的再次发生,设置idle_in_transaction_session_timeout参数,idle_in_transaction状态会话超过时长自动被杀掉

postgres=# show idle_in_transaction_session_timeout;
 idle_in_transaction_session_timeout 
-------------------------------------
 0
(1 row)

postgres=# alter system set idle_in_transaction_session_timeout='12h';  --idle_in_transaction状态超过时长12小时自动被杀掉
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# show idle_in_transaction_session_timeout;
 idle_in_transaction_session_timeout 
-------------------------------------
 12h
(1 row)


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论