暂无图片
Pacemaker的PostgreSQL一主多从读
我来答
分享
李石岩
2020-03-12
Pacemaker的PostgreSQL一主多从读

作者搭建了一套postgresql集群,Pacemaker的PostgreSQL一主多从。在在做主节点网卡测试停启的过程中出现问题,作者node4为主节点,node5 node6 为从节点,node4在做了ifconfig ens33 down以后,发现node5为主节点了,node6也会首先down下然后启动,这种情况下,node4 ifconfig ens33 up 以后,对其进行cls_repair_by_pg_rewind 恢复,发现恢复不成功,出现以下问题
[root@node4 data]# cls_repair_by_pg_rewind
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
connected to server

could not fetch remote file “global/pg_control”: ERROR: permission denied for function pg_read_binary_file
Failure, exiting
failed to call “/usr/pgsql/bin/pg_rewind -D /data/pgsql/data --source-server=‘host=10.10.10.19’ port=5432 -P”

从问题上看,是权限除了问题,由于作者刚刚学习这个,不太明白怎么解决。请大神帮忙指导指导。

我来答
添加附件
收藏
分享
问题补充
6条回答
默认
最新
李石岩

以下是各节点的状态
[root@node4 data]# cls_status
resource msPostgresql is NOT running
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Thu Mar 12 12:40:34 2020
Last change: Thu Mar 12 12:39:54 2020 by root via cibadmin on node4

3 nodes configured
9 resources configured

Online: [ node4 node5 node6 ]

Full list of resources:

vip-master (ocf:💓IPaddr2): Started node5
vip-slave (ocf:💓IPaddr2): Started node6
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node5 ]
Slaves: [ node6 ]
Stopped: [ node4 ]
lvsdr (ocf:💓lvsdr): Started node6
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node6 ]
Stopped: [ node4 node5 ]

Node Attributes:

  • Node node4:
    • master-pgsql : -INFINITY
    • pgsql-data-status : DISCONNECT
    • pgsql-status : STOP
  • Node node5:
    • master-pgsql : 1000
    • pgsql-data-status : LATEST
    • pgsql-master-baseline : 00000000050001E0
    • pgsql-status : PRI
  • Node node6:
    • pgsql-data-status : STREAMING|SYNC
    • pgsql-status : HS:sync

Migration Summary:

  • Node node6:
  • Node node5:
  • Node node4:
    pgsql: migration-threshold=3 fail-count=1000000 last-failure=‘Thu Mar 12 12:29:32 2020’

Failed Actions:

  • pgsql_start_0 on node4 ‘unknown error’ (1): call=55, status=complete, exitreason=‘The master’s timeline forked off current database system timeline 2 before latest checkpoint location 00000000050001E0, REPL_INF’,
    last-rc-change=‘Thu Mar 12 12:29:31 2020’, queued=0ms, exec=267ms

pgsql_REPL_INFO:node5|2|00000000050001E0

暂无图片 评论
暂无图片 有用 0
李石岩

经过测试,我将/data/pgsql/data目录给删除掉,然后在root用户下执行cls_rebuild_slave,node4进行恢复了。但是rewind这种方法仍然没有解决。

暂无图片 评论
暂无图片 有用 0
李石岩

[root@node5 pha4pgsql]# cls_repair_by_pg_rewind
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
connected to server

could not fetch remote file “global/pg_control”: ERROR: permission denied for function pg_read_binary_file
Failure, exiting
failed to call “/usr/pgsql/bin/pg_rewind -D /data/pgsql/data --source-server=‘host=10.10.10.19’ port=5432 -P”
[root@node5 pha4pgsql]# rm /data/pgsql/data/ -rf
[root@node5 pha4pgsql]# cls_
cls_cleanup cls_master cls_repair_by_pg_rewind cls_start cls_unmaintenance
cls_maintenance cls_online_switch cls_reset_master cls_status cls_unmaintenance_node
cls_maintenance_node cls_rebuild_slave cls_standby_node cls_stop cls_unstandby_node
[root@node5 pha4pgsql]# cls_rebuild_slave
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
24559/24559 kB (100%), 1/1 tablespace
pg_basebackup complete!
resource msPostgresql is NOT running
resource msPostgresql is NOT running
Cleaned up all resources on all nodes
Waiting for 1 replies from the CRMd. OK
wait for recovery complete

slave recovery of node5 successed
[root@node5 pha4pgsql]# su - postgres
上一次登录:四 3月 12 16:42:55 CST 2020pts/0 上
postgres@node5-> pg_controldata|grep cluster
Database cluster state: in archive recovery

暂无图片 评论
暂无图片 有用 0
李石岩

对主节点进行了killall postgres 发现并没有出现掉线的状态,在显示日志出现了报错。可能与配置文件中的migration-threshold=3这个参数设置有关,出现三次的话,就可能会切换了。
[root@node6 tmp]# cls_status
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Fri Mar 13 09:35:33 2020
Last change: Thu Mar 12 23:43:35 2020 by root via crm_attribute on node4

3 nodes configured
9 resources configured

Online: [ node4 node5 node6 ]

Full list of resources:

vip-master (ocf:💓IPaddr2): Started node4
vip-slave (ocf:💓IPaddr2): Started node5
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node4 ]
Slaves: [ node5 node6 ]
lvsdr (ocf:💓lvsdr): Started node5
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node5 node6 ]
Stopped: [ node4 ]

Node Attributes:

  • Node node4:
    • master-pgsql : 1000
    • pgsql-data-status : LATEST
    • pgsql-master-baseline : 000000000B000258
    • pgsql-status : PRI
  • Node node5:
    • master-pgsql : 100
    • pgsql-data-status : STREAMING|SYNC
    • pgsql-status : HS:sync
  • Node node6:
    • master-pgsql : -INFINITY
    • pgsql-data-status : STREAMING|ASYNC
    • pgsql-status : HS:async

Migration Summary:

  • Node node6:
  • Node node5:
  • Node node4:
    pgsql: migration-threshold=3 fail-count=2 last-failure=‘Thu Mar 12 23:43:24 2020’

Failed Actions:

  • pgsql_monitor_3000 on node4 ‘not running’ (7): call=161, status=complete, exitreason=’’,
    last-rc-change=‘Thu Mar 12 23:43:24 2020’, queued=0ms, exec=0ms

pgsql_REPL_INFO:node4|4|0000000009034248
[root@node6 tmp]# pcs resource cleanup msPostgresql
Cleaned up pgsql:0 on node6
Cleaned up pgsql:1 on node5
Cleaned up pgsql:2 on node4
Waiting for 1 replies from the CRMd. OK
[root@node6 tmp]# cls_status
Stack: corosync
Current DC: node5 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Fri Mar 13 09:37:05 2020
Last change: Fri Mar 13 09:37:02 2020 by hacluster via crmd on node4

3 nodes configured
9 resources configured

Online: [ node4 node5 node6 ]

Full list of resources:

vip-master (ocf:💓IPaddr2): Started node4
vip-slave (ocf:💓IPaddr2): Started node5
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node4 ]
Slaves: [ node5 node6 ]
lvsdr (ocf:💓lvsdr): Started node5
Clone Set: lvsdr-realsvr-clone [lvsdr-realsvr]
Started: [ node5 node6 ]
Stopped: [ node4 ]

Node Attributes:

  • Node node4:
    • master-pgsql : 1000
    • pgsql-data-status : LATEST
    • pgsql-master-baseline : 000000000B000258
    • pgsql-status : PRI
  • Node node5:
    • master-pgsql : 100
    • pgsql-data-status : STREAMING|SYNC
    • pgsql-status : HS:sync
  • Node node6:
    • master-pgsql : -INFINITY
    • pgsql-data-status : STREAMING|ASYNC
    • pgsql-status : HS:async

Migration Summary:

  • Node node6:
  • Node node5:
  • Node node4:

pgsql_REPL_INFO:node4|4|0000000009034248
[root@node6 tmp]#

暂无图片 评论
暂无图片 有用 0
李石岩

跟pger朋友聊天,关闭主节点,主节点启动不起来,可以试着采用以下方法
pcs resource cleanup master-group
pcs resource cleanup msPostgresql
rm /var/lib/pgsql/tmp/PGSQL.lock

https://blog.csdn.net/joshua1830/article/details/51943999

暂无图片 评论
暂无图片 有用 0
手机用户6325

大佬,这种cls_*命令在哪查文档

暂无图片 评论
暂无图片 有用 0
李石岩
题主
2025-02-25
这个你找找吧。
回答交流
提交
问题信息
请登录之后查看
邀请回答
暂无人订阅该标签,敬请期待~~
暂无图片墨值悬赏