[postgresql] repmgr集群添加新节点后发现异常,信息也不同步。
[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
- 尝试standby follow
按照文章
“[postgresql] repmgr集群 状态显示 - node “node*” (ID: %n) is not attached to its upstream node”
尝试repmgr v失败
[pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf ERROR: unable to retrieve record for local node 3 [pg12@node3 ~]$
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2 [pg12@node3 ~]$
显然无法使用standby follow达成同步。
- 尝试关闭节点,重新rejoin
如下即是尝试 关闭问题node,然后尝试使用repmgr node rejoin解决问题。
[pg12@node3 ~]$repmgr -f /home/pg12/conf/repmgr.conf node service --action=stop DETAIL: executing server command "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -W -m fast stop" [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr node rejoin -f /home/pg12/conf/repmgr.conf -d 'host=192.168.5.101 dbname=repmgr user=repmgr' --config-fil es=postgresql.local.conf,postgresql.conf --verbose --force-rewind NOTICE: using provided configuration file "/home/pg12/conf/repmgr.conf" ERROR: unable to retrieve node record for the local node HINT: check the local node is registered with the current primary "node1" (ID: 1) Segmentation fault (core dumped) [pg12@node3 ~]$ [pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2 3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
显然无法实现恢复。
- 最后尝试注销,重新备考建设
注销node3节点的standby注册,然后重新克隆clone并注册standby:
repmgr -f /home/pg12/conf/repmgr.conf standby unregister
repmgr -f /home/pg12/conf/repmgr.conf cluster show
[pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf standby unregister INFO: connecting to local standby INFO: connecting to primary database NOTICE: unregistering node 3 INFO: standby unregistration complete [pg12@node3 ~]$ repmgr -f /home/pg12/conf/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2 2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
确认取消注册后清除目前数据库,进行克隆(repmgr standby clone):
rm -rf /home/pg12/data
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run
repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data
touch $PGDATA/standby.signal
此处$PGDATA 为设置环境变量, 此处就是/home/pg12/data。
[pg12@node3 ~]$ rm -rf /home/pg12/data
#--dry-run 选项为预检查,可以发现潜在一些问题
[pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data --dry-run
NOTICE: destination directory "/home/pg12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 9 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met
#--dry-run 运行后没有什么报错,正式进行克隆
[pg12@node3 ~]$ repmgr -h 192.168.5.101 -U repmgr -d repmgr standby clone -D /home/pg12/data
NOTICE: destination directory "/home/pg12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.5.101 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/home/pg12/data"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
pg_basebackup -l "repmgr base backup" -D /home/pg12/data -h 192.168.5.101 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/pg12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
# 根据主库primary克隆备库后先不启动备库,在数据库目录创建空文件$PGDATA/standby.signal(限定于postgresql 12版本,这个识别文件作用是启动时使数据库判定自己为standby库)
[pg12@node3 ~]$ touch $PGDATA/standby.signal
[pg12@node3 ~]$ pg_ctl -D $PGDATA start
# 查看pg_is_in_recovery(),判断是否以备库身份启动。
[pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr repmgr -c 'select * from pg_is_in_recovery();'
pg_is_in_recovery
-------------------
t
(1 row)
检查备库正常启动后,注册备库(repmgr standby register):
[pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
ERROR: local node not attached to primary node 1
HINT: specify the actual upstream node id with --upstream-node-id, or use -F/--force to continue anyway
[pg12@node3 ~]$ repmgr standby register -f /home/pg12/conf/repmgr.conf --force
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
WARNING: local node not attached to primary node 1
NOTICE: -F/--force supplied, continuing anyway
INFO: standby registration complete
NOTICE: standby node "node3" (ID: 3) successfully registered
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | ! node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- node "node3" (ID: 3) is not attached to its upstream node "node1" (ID: 1)
#确定成功组成,然后处理问题not attached to its upstream node
[pg12@node3 ~]$ repmgr standby follow -f /home/pg12/conf/repmgr.conf
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/F00A488, follow target lsn is 0/F00A488
NOTICE: setting node 3's upstream to node 1
NOTICE: stopping server using "/home/pg12/soft/bin/pg_ctl -D '/home/pg12/data' -w -m fast stop"
2023-05-26 21:56:37.093 CST [33220] LOG: received fast shutdown request
2023-05-26 21:56:37.093 CST [33220] LOG: aborting any active transactions
2023-05-26 21:56:37.093 CST [33225] FATAL: terminating walreceiver process due to administrator command
2023-05-26 21:56:37.094 CST [33222] LOG: shutting down
2023-05-26 21:56:37.096 CST [33220] LOG: database system is shut down
NOTICE: starting server using "/home/pg12/soft/bin/pg_ctl -w -D '/home/pg12/data' start"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node1" (ID: 1)
[pg12@node3 ~]$ repmgr cluster show -f /home/pg12/conf/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.5.101 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.5.102 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 1 | host=192.168.5.103 user=repmgr dbname=repmgr connect_timeout=2
[pg12@node3 ~]$
最后发现集群信息显示完美解决。
同时建表t1进行测试:
[pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 (1 row) [pg12@node3 ~]$ psql -h 192.168.5.101 -U repmgr postgres -c 'insert into t1 values(2,'li4');' INSERT 0 1 [pg12@node3 ~]$ psql -h 192.168.5.103 -U repmgr postgres -c 'select * from t1;' id | comm ----+-------- 1 | zhang3 2 | li4 (2 rows)
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




