[postgresql] repmgr集群维护“服务暂停”（repmgr service pause）

原创手机用户0512 2023-06-05

1250

对repmgr集群来说，尽可能保持可用是设计的重要目的。但难免会遇到需要维护的情况，如本身插件需要升级等情况。

repmgr service pause 服务暂停就可以在所有集群节点可用的情况下（如果有不可用情况当然是先解决问题），指示复制集群中的所有 repmgrd 实例暂停故障转移操作。

本命令可以在群集中的任何主动节点上运行，效果是在检测到故障转移事件时不采取操作。对于执行维护操作（如切换）非常有用，否则如果 repmgrd 正常运行，则可能会触发故障转移。

先确认节点状态

[pg12@node1 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2

准备执行repmgr service pause --dry-run -f /home/pg12/conf/repmgr.conf


# --dry-run选项 预演操作，不直接执行。同时也是再次检查节点是否可访问。
[pg12@node2 ~]$ repmgr service pause --dry-run  -f /home/pg12/conf/repmgr.conf
INFO: would pause node 1 (node1)
INFO: would pause node 2 (node2)
INFO: would pause node 3 (node3)

# 假设我们关闭一个节点
[pg12@node3 ~]$ pg_ctl -D /home/pg12/data  stop
waiting for server to shut down....2023-06-01 07:04:50.262 CST [10612] LOG:  received fast shutdown request
2023-06-01 07:04:50.263 CST [10612] LOG:  aborting any active transactions
2023-06-01 07:04:50.263 CST [10910] FATAL:  terminating walreceiver process due to administrator command
2023-06-01 07:04:50.264 CST [10614] LOG:  shutting down
2023-06-01 07:04:50.266 CST [10612] LOG:  database system is shut down
 done
server stopped

# 预演--dry-run 会有所提示
[pg12@node2 ~]$ repmgr service pause --dry-run  -f /home/pg12/conf/repmgr.conf
INFO: would pause node 1 (node1)
INFO: would pause node 2 (node2)
WARNING: unable to connect to node 3
ERROR: unable to pause 1 node(s)
HINT: execute "repmgr service status" to view current status

正式开始执行：

[pg12@node3 ~]$ pg_ctl -D /home/pg12/data  start
waiting for server to start....2023-06-01 07:06:45.713 CST [11280] LOG:  starting PostgreSQL 12.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
2023-06-01 07:06:45.713 CST [11280] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-06-01 07:06:45.713 CST [11280] LOG:  listening on IPv6 address "::", port 5432
2023-06-01 07:06:45.714 CST [11280] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-06-01 07:06:45.721 CST [11281] LOG:  database system was shut down in recovery at 2023-06-01 07:04:50 CST
2023-06-01 07:06:45.721 CST [11281] LOG:  entering standby mode
2023-06-01 07:06:45.721 CST [11281] LOG:  redo starts at 0/100000A0
2023-06-01 07:06:45.721 CST [11281] LOG:  consistent recovery state reached at 0/100000D8
2023-06-01 07:06:45.721 CST [11281] LOG:  record with incorrect prev-link 0/E000060 at 0/100000D8
2023-06-01 07:06:45.722 CST [11280] LOG:  database system is ready to accept read only connections
2023-06-01 07:06:45.725 CST [11285] LOG:  started streaming WAL from primary at 0/10000000 on timeline 1
 done
server started
[pg12@node3 ~]$ repmgr service pause --dry-run  -f /home/pg12/conf/repmgr.conf
INFO: would pause node 1 (node1)
INFO: would pause node 2 (node2)
INFO: would pause node 3 (node3)
[pg12@node3 ~]$ repmgr service pause  -f /home/pg12/conf/repmgr.conf
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused

然后就可以按计划执行维护操作。

之后完成维护需要退出暂停状态则使用repmgr service unpause。

实验如下：


[pg12@node1 ~]$ repmgr service status  -f /home/pg12/conf/repmgr.conf
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 11573 | yes     | n/a
 2  | node2 | standby |   running | node1    | running | 12482 | yes     | 0 second(s) ago
 3  | node3 | standby |   running | node1    | running | 14240 | yes     | 0 second(s) ago
[pg12@node1 ~]$ repmgr service unpause  -f /home/pg12/conf/repmgr.conf
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused
[pg12@node1 ~]$ repmgr service status  -f /home/pg12/conf/repmgr.conf
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 11573 | no      | n/a
 2  | node2 | standby |   running | node1    | running | 12482 | no      | 1 second(s) ago
 3  | node3 | standby |   running | node1    | running | 14240 | no      | 0 second(s) ago
[pg12@node1 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf

在重新启动的节点上，在更新其状态之前需要一两秒钟。所以在任何节点上重新启动PostgreSQL后等待几秒钟，然后再运行repmgr service pause/unpause.

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

[postgresql] repmgr集群维护“服务暂停”（repmgr service pause）

评论