作者
digoal
日期
2020-10-10
标签
PostgreSQL , wal接收优化
背景
https://www.postgresql.org/message-id/flat/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com
以前的版本, 在standby启动时, 必须等replay当前wal目录中的所有wal文件(startup process)结束后, 才会启动wal receiver进程接收上游wal 数据, 如果startup花费了很长时间, 上游的wal可能会被rotate(在没有使用slot或者wal keep segments较小时).
PG 14解决了这个问题, 无需等待startup process replay所有wal目录中的wal文件.
```
Hi
Standby does not start walreceiver process until startup process
finishes WAL replay. The more WAL there is to replay, longer is the
delay in starting streaming replication. If replication connection is
temporarily disconnected, this delay becomes a major problem and we
are proposing a solution to avoid the delay.
WAL replay is likely to fall behind when master is processing
write-heavy workload, because WAL is generated by concurrently running
backends on master while only one startup process on standby replays WAL
records in sequence as new WAL is received from master.
Replication connection between walsender and walreceiver may break due
to reasons such as transient network issue, standby going through
restart, etc. The delay in resuming replication connection leads to
lack of high availability - only one copy of WAL is available during
this period.
The problem worsens when the replication is configured to be
synchronous. Commits on master must wait until the WAL replay is
finished on standby, walreceiver is then started and it confirms flush
of WAL upto the commit LSN. If synchronous_commit GUC is set to
remote_write, this behavior is equivalent to tacitly changing it to
remote_apply until the replication connection is re-established!
Has anyone encountered such a problem with streaming replication?
We propose to address this by starting walreceiver without waiting for
startup process to finish replay of WAL. Please see attached
patchset. It can be summarized as follows:
0001 - TAP test to demonstrate the problem.
0002 - The standby startup sequence is changed such that
walreceiver is started by startup process before it begins
to replay WAL.
0003 - Postmaster starts walreceiver if it finds that a
walreceiver process is no longer running and the state
indicates that it is operating as a standby.
This is a POC, we are looking for early feedback on whether the
problem is worth solving and if it makes sense to solve if along this
route.
Hao and Asim
```
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





