暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 14 preview - log_recovery_conflict_waits - standby query&startup process conflict 恢复冲突 超时(timeout)配置,日志打印

digoal 2021-01-01
713

作者

digoal

日期

2021-01-08

标签

PostgreSQL , 冲突 , standby , deadlock_timeout


背景

standby需要replay wal日志, 回放和用户的查询可能存在冲突, 例如查询快照比较老, 但是正在回放的wal可能要做vacuum操作清理老快照可能要查询的数据, 更多详情参考如下:

《PostgreSQL 流复制冲突分类讲解以及对应解决方案 - DEALING WITH STREAMING REPLICATION CONFLICTS IN POSTGRESQL - 特别是lock confict(vacuum truncate suffix free page引起的)》

《PostgreSQL standby conflict replay分析和解决方案》

《PostgreSQL源码分析 备库查询冲突 - User was holding shared buffer pin for too long》

《PostgreSQL 物理流复制从库 - 冲突判定, 谁堵塞了wal replay, 等了多久》

PostgreSQL 14可以支持冲突日志打印了, 超过deadlock_time配置的冲突等待都会被打印到日志中(log_recovery_conflict_waits=on 时), 包括冲突对应的锁类型, 什么backend pid导致的等等. 方便我们观察冲突发生的历史事件, 分析问题.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0650ff23038bc3eb8d8fd851744db837d921e285

```
Add GUC to log long wait times on recovery conflicts.
author Fujii Masao fujii@postgresql.org
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)
committer Fujii Masao fujii@postgresql.org
Thu, 7 Jan 2021 15:47:03 +0000 (00:47 +0900)
commit 0650ff23038bc3eb8d8fd851744db837d921e285
tree 1df2b5ccd7a20ee538c93339b115ca4f274a5aa6 tree | snapshot
parent f7a1a805cb178653ea2a6c8991ad73b035af953e commit | diff
Add GUC to log long wait times on recovery conflicts.

This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
```

+ <varlistentry id="guc-log-recovery-conflict-waits" xreflabel="log_recovery_conflict_waits"> + <term><varname>log_recovery_conflict_waits</varname> (<type>boolean</type>) + <indexterm> + <primary><varname>log_recovery_conflict_waits</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Controls whether a log message is produced when the startup process + is waiting longer than <varname>deadlock_timeout</varname> + for recovery conflicts. This is useful in determining if recovery + conflicts prevent the recovery from applying WAL. + </para> + + <para> + The default is <literal>off</literal>. This parameter can only be set + in the <filename>postgresql.conf</filename> file or on the server + command line. + </para> + </listitem> + </varlistentry>

+/* + * Log the recovery conflict. + * + * wait_start is the timestamp when the caller started to wait. + * now is the timestamp when this function has been called. + * wait_list is the list of virtual transaction ids assigned to + * conflicting processes. + */ +void +LogRecoveryConflict(ProcSignalReason reason, TimestampTz wait_start, + TimestampTz now, VirtualTransactionId *wait_list) +{ + long secs; + int usecs; + long msecs; + StringInfoData buf; + int nprocs = 0; + + TimestampDifference(wait_start, now, &secs, &usecs); + msecs = secs * 1000 + usecs / 1000; + usecs = usecs % 1000; + + if (wait_list) + { + VirtualTransactionId *vxids; + + /* Construct a string of list of the conflicting processes */ + vxids = wait_list; + while (VirtualTransactionIdIsValid(*vxids)) + { + PGPROC *proc = BackendIdGetProc(vxids->backendId); + + /* proc can be NULL if the target backend is not active */ + if (proc) + { + if (nprocs == 0) + { + initStringInfo(&buf); + appendStringInfo(&buf, "%d", proc->pid); + } + else + appendStringInfo(&buf, ", %d", proc->pid); + + nprocs++; + } + + vxids++; + } + } + + /* + * If wait_list is specified, report the list of PIDs of active + * conflicting backends in a detail message. Note that if all the backends + * in the list are not active, no detail message is logged. + */ + ereport(LOG, + errmsg("recovery still waiting after %ld.%03d ms: %s", + msecs, usecs, _(get_recovery_conflict_desc(reason))), + nprocs > 0 ? errdetail_log_plural("Conflicting process: %s.", + "Conflicting processes: %s.", + nprocs, buf.data) : 0); + + if (nprocs > 0) + pfree(buf.data); +}

+/* Return the description of recovery conflict */ +static const char * +get_recovery_conflict_desc(ProcSignalReason reason) +{ + const char *reasonDesc = gettext_noop("unknown reason"); + + switch (reason) + { + case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN: + reasonDesc = gettext_noop("recovery conflict on buffer pin"); + break; + case PROCSIG_RECOVERY_CONFLICT_LOCK: + reasonDesc = gettext_noop("recovery conflict on lock"); + break; + case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: + reasonDesc = gettext_noop("recovery conflict on tablespace"); + break; + case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: + reasonDesc = gettext_noop("recovery conflict on snapshot"); + break; + case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK: + reasonDesc = gettext_noop("recovery conflict on buffer deadlock"); + break; + case PROCSIG_RECOVERY_CONFLICT_DATABASE: + reasonDesc = gettext_noop("recovery conflict on database"); + break; + default: + break; + } + + return reasonDesc; +}

冲突结束后,如果整个冲突时间超过deadlock_timeout,也会被记录下来,用于判断总共等待了多长时间。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=39b03690b529935a3c33024ee68f08e2d347cf4f

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论