5M作者搭建了一套postgresql集群,Pacemaker的PostgreSQL一主多从。在在做主节点网卡测试停启的过程中出现问题,作者node4为主节点,node5 node6 为从节点,node4在做了ifconfig ens33 down以后,发现node5为主节点了,node6也会首先down下然后启动,这种情况下,node4 ifconfig ens33 up 以后,对其进行cls_repair_by_pg_rewind 恢复,发现恢复不成功,出现以下问题
[root@node4 data]# cls_repair_by_pg_rewind
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
connected to server
could not fetch remote file “global/pg_control”: ERROR: permission denied for function pg_read_binary_file
Failure, exiting
failed to call “/usr/pgsql/bin/pg_rewind -D /data/pgsql/data --source-server=‘host=10.10.10.19’ port=5432 -P”
从问题上看,是权限除了问题,由于作者刚刚学习这个,不太明白怎么解决。请大神帮忙指导指导。
以下是各节点的状态
[root@node4 data]# cls_status
resource msPostgresql is NOT running
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Thu Mar 12 12:40:34 2020
Last change: Thu Mar 12 12:39:54 2020 by root via cibadmin on node4
3 nodes configured
9 resources configured
Online: [ node4 node5 node6 ]
Full list of resources:
vip-master (ocf:💓IPaddr2): Started node5
vip-slave (ocf:💓IPaddr2): Started node6
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node5 ]
Slaves: [ node6 ]
Stopped: [ node4 ]
lvsdr (ocf:💓lvsdr): Started node6
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node6 ]
Stopped: [ node4 node5 ]
Node Attributes:
- Node node4:
- master-pgsql : -INFINITY
- pgsql-data-status : DISCONNECT
- pgsql-status : STOP
- Node node5:
- master-pgsql : 1000
- pgsql-data-status : LATEST
- pgsql-master-baseline : 00000000050001E0
- pgsql-status : PRI
- Node node6:
- pgsql-data-status : STREAMING|SYNC
- pgsql-status : HS:sync
Migration Summary:
- Node node6:
- Node node5:
- Node node4:
pgsql: migration-threshold=3 fail-count=1000000 last-failure=‘Thu Mar 12 12:29:32 2020’
Failed Actions:
- pgsql_start_0 on node4 ‘unknown error’ (1): call=55, status=complete, exitreason=‘The master’s timeline forked off current database system timeline 2 before latest checkpoint location 00000000050001E0, REPL_INF’,
last-rc-change=‘Thu Mar 12 12:29:31 2020’, queued=0ms, exec=267ms
pgsql_REPL_INFO:node5|2|00000000050001E0
评论
有用 0经过测试,我将/data/pgsql/data目录给删除掉,然后在root用户下执行cls_rebuild_slave,node4进行恢复了。但是rewind这种方法仍然没有解决。
评论
有用 0[root@node5 pha4pgsql]# cls_repair_by_pg_rewind
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
connected to server
could not fetch remote file “global/pg_control”: ERROR: permission denied for function pg_read_binary_file
Failure, exiting
failed to call “/usr/pgsql/bin/pg_rewind -D /data/pgsql/data --source-server=‘host=10.10.10.19’ port=5432 -P”
[root@node5 pha4pgsql]# rm /data/pgsql/data/ -rf
[root@node5 pha4pgsql]# cls_
cls_cleanup cls_master cls_repair_by_pg_rewind cls_start cls_unmaintenance
cls_maintenance cls_online_switch cls_reset_master cls_status cls_unmaintenance_node
cls_maintenance_node cls_rebuild_slave cls_standby_node cls_stop cls_unstandby_node
[root@node5 pha4pgsql]# cls_rebuild_slave
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
24559/24559 kB (100%), 1/1 tablespace
pg_basebackup complete!
resource msPostgresql is NOT running
resource msPostgresql is NOT running
Cleaned up all resources on all nodes
Waiting for 1 replies from the CRMd. OK
wait for recovery complete
…
slave recovery of node5 successed
[root@node5 pha4pgsql]# su - postgres
上一次登录:四 3月 12 16:42:55 CST 2020pts/0 上
postgres@node5-> pg_controldata|grep cluster
Database cluster state: in archive recovery
评论
有用 0对主节点进行了killall postgres 发现并没有出现掉线的状态,在显示日志出现了报错。可能与配置文件中的migration-threshold=3这个参数设置有关,出现三次的话,就可能会切换了。
[root@node6 tmp]# cls_status
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Fri Mar 13 09:35:33 2020
Last change: Thu Mar 12 23:43:35 2020 by root via crm_attribute on node4
3 nodes configured
9 resources configured
Online: [ node4 node5 node6 ]
Full list of resources:
vip-master (ocf:💓IPaddr2): Started node4
vip-slave (ocf:💓IPaddr2): Started node5
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node4 ]
Slaves: [ node5 node6 ]
lvsdr (ocf:💓lvsdr): Started node5
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node5 node6 ]
Stopped: [ node4 ]
Node Attributes:
- Node node4:
- master-pgsql : 1000
- pgsql-data-status : LATEST
- pgsql-master-baseline : 000000000B000258
- pgsql-status : PRI
- Node node5:
- master-pgsql : 100
- pgsql-data-status : STREAMING|SYNC
- pgsql-status : HS:sync
- Node node6:
- master-pgsql : -INFINITY
- pgsql-data-status : STREAMING|ASYNC
- pgsql-status : HS:async
Migration Summary:
- Node node6:
- Node node5:
- Node node4:
pgsql: migration-threshold=3 fail-count=2 last-failure=‘Thu Mar 12 23:43:24 2020’
Failed Actions:
- pgsql_monitor_3000 on node4 ‘not running’ (7): call=161, status=complete, exitreason=’’,
last-rc-change=‘Thu Mar 12 23:43:24 2020’, queued=0ms, exec=0ms
pgsql_REPL_INFO:node4|4|0000000009034248
[root@node6 tmp]# pcs resource cleanup msPostgresql
Cleaned up pgsql:0 on node6
Cleaned up pgsql:1 on node5
Cleaned up pgsql:2 on node4
Waiting for 1 replies from the CRMd. OK
[root@node6 tmp]# cls_status
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Fri Mar 13 09:37:05 2020
Last change: Fri Mar 13 09:37:02 2020 by hacluster via crmd on node4
3 nodes configured
9 resources configured
Online: [ node4 node5 node6 ]
Full list of resources:
vip-master (ocf:💓IPaddr2): Started node4
vip-slave (ocf:💓IPaddr2): Started node5
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node4 ]
Slaves: [ node5 node6 ]
lvsdr (ocf:💓lvsdr): Started node5
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node5 node6 ]
Stopped: [ node4 ]
Node Attributes:
- Node node4:
- master-pgsql : 1000
- pgsql-data-status : LATEST
- pgsql-master-baseline : 000000000B000258
- pgsql-status : PRI
- Node node5:
- master-pgsql : 100
- pgsql-data-status : STREAMING|SYNC
- pgsql-status : HS:sync
- Node node6:
- master-pgsql : -INFINITY
- pgsql-data-status : STREAMING|ASYNC
- pgsql-status : HS:async
Migration Summary:
- Node node6:
- Node node5:
- Node node4:
pgsql_REPL_INFO:node4|4|0000000009034248
[root@node6 tmp]#
评论
有用 0跟pger朋友聊天,关闭主节点,主节点启动不起来,可以试着采用以下方法
pcs resource cleanup master-group
pcs resource cleanup msPostgresql
rm /var/lib/pgsql/tmp/PGSQL.lock
https://blog.csdn.net/joshua1830/article/details/51943999
评论
有用 0
墨值悬赏

