暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

postgresql分区表DDL操作时导致pg_dump等待

原创 仙人掌 2024-08-14
546
问题概述

分区表在执行DDL的时候,pg_dump执行导出任意表都会等待

问题复现

会话1开启一个事务,查询分区表test_list

postgres=# begin; BEGIN postgres=*# select * from test_list limit 1; id | name ----+------ 1 | 1 (1 row) postgres=*#

会话2执行truncate分区表test_list

postgres=# truncate test_list;

会话3执行pg_dump导出a表定义,执行后并未输出任何信息

[postgres@mydb1a src]$ /app/pg12_7/bin/pg_dump -p 5431 -t a -s

会话4查看活动会话和锁信息,可以看到pid:1496559(pg_dump) -> 等待pid:1496329(truncate) -> 等待pid:1494903(select)

postgres=# select pid,application_name,xact_start,state,wait_event,wait_event_type,query from pg_stat_activity where state<>'idle' order by xact_start; pid | application_name | xact_start | state | wait_event | wait_event_type |query -------------------------- 1494903 | psql | 2024-08-14 14:26:49.79773+08 | idle in transaction | ClientRead | Client | select * from test_list limit 1; 1496329 | psql | 2024-08-14 14:29:13.82338+08 | active | relation | Lock | truncate test_list; 1496559 | pg_dump | 2024-08-14 14:29:21.555196+08 | active | relation | Lock | SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) WITH ORDINALITY AS perm(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog .acldefault(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) AS init(init_acl) WHERE acl = init_acl)) as foo) AS relacl, (SELECT pg_catalo g.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) WITH ORDINALITY AS initp(acl,row_n) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefau lt(CASE WHEN c.relkind = 'S' THEN 's' ELSE 'r' END::"char",c.relowner))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo) as rrelacl, NULL AS initrelacl, NULL as initrrelacl, c.relkin postgres=# select relation::regclass,pid,pg_blocking_pids(pid),mode from pg_locks where granted='f'; relation | pid | pg_blocking_pids | mode -----------+---------+------------------+--------------------- test_list | 1496559 | {1496329} | AccessShareLock test_list | 1496329 | {1494903} | AccessExclusiveLock (2 rows)
问题分析

此时test_list表是有锁的,这毫无疑问,但是pg_dump导出a表时为什么也要等待?

首先要拿到pg_dump执行的这个完整SQL是什么,设置参数log_statement=all,再次重复测试,在日志中找到完整SQL

只展示部分SQL

SELECT c.tableoid ...... pg_get_partkeydef(c.oid) AS partkeydef, c.relispartition AS ispartition, pg_get_expr(c.relpartbound, c.oid) AS partbound FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype IN ('a', 'i')) LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND c.relkind <> 'p') LEFT JOIN pg_am am ON (c.relam = am.oid) LEFT JOIN pg_init_privs pip ON (c.oid = pip.objoid AND pip.classoid = 'pg_class'::regclass AND pip.objsubid = 0) WHERE c.relkind in ('r', 'S', 'v', 'c', 'm', 'f', 'p') ORDER BY c.oid

可以看到SQL中的两个函数pg_get_partkeydef、pg_get_expr,问题就在这,表test_list正在执行DDL,表上会加最高级别锁AccessExclusiveLock,此时表test_list是不可访问,分区函数也不可以执行,pg_dump就会在此处等待。

测试pg15及之后的版本已不存在该问题

pg15源码"bin/pg_dump/pg_dump.c"注释中可以看到相关说明

/* * Find all the tables and table-like objects. * * We must fetch all tables in this phase because otherwise we cannot * correctly identify inherited columns, owned sequences, etc. * * We include system catalogs, so that we can work if a user table is * defined to inherit from a system catalog (pretty weird, but...) * * Note: in this phase we should collect only a minimal amount of * information about each table, basically just enough to decide if it is * interesting. In particular, since we do not yet have lock on any user * table, we MUST NOT invoke any server-side data collection functions * (for instance, pg_get_partkeydef()). Those are likely to fail or give * wrong answers if any concurrent DDL is happening. */
问题总结

1.分区表正在执行DDL时,执行pg_dump导出任意表都会卡住,pg15及之后版本已优化该问题可以正常执行

2.普通表a正在执行DDL时,执行pg_dump导出其他表没问题

导出a表卡住,可以在pg_dump时使用选项–lock-wait-timeout=TIMEOUT ,设置超时时间,超时后pg_dump退出

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论