对repmgr集群来说,尽可能保持可用是设计的重要目的。但难免会遇到需要维护的情况,如本身插件需要升级等情况。
repmgr service pause 服务暂停就可以在所有集群节点可用的情况下(如果有不可用情况当然是先解决问题),指示复制集群中的所有 repmgrd 实例暂停故障转移操作。
本命令可以在群集中的任何主动节点上运行,效果是在检测到故障转移事件时不采取操作。对于执行维护操作(如切换)非常有用,否则如果 repmgrd 正常运行,则可能会触发故障转移。
先确认节点状态
[pg12@node1 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
准备执行repmgr service pause --dry-run -f /home/pg12/conf/repmgr.conf
# --dry-run选项 预演操作,不直接执行。同时也是再次检查节点是否可访问。
[pg12@node2 ~]$ repmgr service pause --dry-run -f /home/pg12/conf/repmgr.conf
INFO: would pause node 1 (node1)
INFO: would pause node 2 (node2)
INFO: would pause node 3 (node3)
# 假设我们关闭一个节点
[pg12@node3 ~]$ pg_ctl -D /home/pg12/data stop
waiting for server to shut down....2023-06-01 07:04:50.262 CST [10612] LOG: received fast shutdown request
2023-06-01 07:04:50.263 CST [10612] LOG: aborting any active transactions
2023-06-01 07:04:50.263 CST [10910] FATAL: terminating walreceiver process due to administrator command
2023-06-01 07:04:50.264 CST [10614] LOG: shutting down
2023-06-01 07:04:50.266 CST [10612] LOG: database system is shut down
done
server stopped
# 预演--dry-run 会有所提示
[pg12@node2 ~]$ repmgr service pause --dry-run -f /home/pg12/conf/repmgr.conf
INFO: would pause node 1 (node1)
INFO: would pause node 2 (node2)
WARNING: unable to connect to node 3
ERROR: unable to pause 1 node(s)
HINT: execute "repmgr service status" to view current status
正式开始执行:
[pg12@node3 ~]$ pg_ctl -D /home/pg12/data start waiting for server to start....2023-06-01 07:06:45.713 CST [11280] LOG: starting PostgreSQL 12.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit 2023-06-01 07:06:45.713 CST [11280] LOG: listening on IPv4 address "0.0.0.0", port 5432 2023-06-01 07:06:45.713 CST [11280] LOG: listening on IPv6 address "::", port 5432 2023-06-01 07:06:45.714 CST [11280] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-06-01 07:06:45.721 CST [11281] LOG: database system was shut down in recovery at 2023-06-01 07:04:50 CST 2023-06-01 07:06:45.721 CST [11281] LOG: entering standby mode 2023-06-01 07:06:45.721 CST [11281] LOG: redo starts at 0/100000A0 2023-06-01 07:06:45.721 CST [11281] LOG: consistent recovery state reached at 0/100000D8 2023-06-01 07:06:45.721 CST [11281] LOG: record with incorrect prev-link 0/E000060 at 0/100000D8 2023-06-01 07:06:45.722 CST [11280] LOG: database system is ready to accept read only connections 2023-06-01 07:06:45.725 CST [11285] LOG: started streaming WAL from primary at 0/10000000 on timeline 1 done server started [pg12@node3 ~]$ repmgr service pause --dry-run -f /home/pg12/conf/repmgr.conf INFO: would pause node 1 (node1) INFO: would pause node 2 (node2) INFO: would pause node 3 (node3) [pg12@node3 ~]$ repmgr service pause -f /home/pg12/conf/repmgr.conf NOTICE: node 1 (node1) paused NOTICE: node 2 (node2) paused NOTICE: node 3 (node3) paused
然后就可以按计划执行维护操作。
之后完成维护需要退出暂停状态则使用repmgr service unpause。
实验如下:
[pg12@node1 ~]$ repmgr service status -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 11573 | yes | n/a 2 | node2 | standby | running | node1 | running | 12482 | yes | 0 second(s) ago 3 | node3 | standby | running | node1 | running | 14240 | yes | 0 second(s) ago [pg12@node1 ~]$ repmgr service unpause -f /home/pg12/conf/repmgr.conf NOTICE: node 1 (node1) unpaused NOTICE: node 2 (node2) unpaused NOTICE: node 3 (node3) unpaused [pg12@node1 ~]$ repmgr service status -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 11573 | no | n/a 2 | node2 | standby | running | node1 | running | 12482 | no | 1 second(s) ago 3 | node3 | standby | running | node1 | running | 14240 | no | 0 second(s) ago [pg12@node1 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
在重新启动的节点上,在更新其状态之前需要一两秒钟。所以在任何节点上重新启动PostgreSQL后等待几秒钟,然后再运行repmgr service pause/unpause.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




