db_ha集群安装后的自动切换及切换后的判断步骤说明文档

瀚高PG实验室 2023-03-16

435

环境

文档用途

详细信息

环境

系统平台：Linux x86-64 Red Hat Enterprise Linux 7

版本：4.5.7

文档用途

本文档用于指导db_ha集群安装后的自动切换及切换后的判断步骤

详细信息

一、db_ha集群，会在主库断网，主库宕机的情况下发生主备切换，具体模拟实验请查看附件。

二、db_ha集群，原主库服务器在恢复正常状态之后，会自动降级为备库加入集群。

三、判断切换之后的集群为状态采用如下方法。

1、检查集群流复制

①主库查看send进程

ps -ef | grep -v grep | grep walsender


root     21466 19862  0 14:51 ?        00:00:00 postgres: walsender sysdba 192.168.80.230(36582) streaming 0/4053860


root     23017 19862  0 15:06 ?        00:00:00 postgres: walsender sysdba 192.168.80.228(47786) streaming 0/4053860

（左右滑动查看完整内容）

②备库查看receive进程

ps -ef | grep -v grep | grep walreceive


root     13489 13482  0 15:45 ?        00:00:06 postgres: walreceiver   streaming 0/6004460

（左右滑动查看完整内容）

③如果数据库查不到send和receive进程，只有post进程，数据库已经脱离流复制，成为单机。

ps -ef | grep post


root      4148     1  0 6月13 ?       00:04:36 opt/HighGo4.5.7-see/bin/postgres -D opt/HighGo4.5.7-see/data


root      4149  4148  0 6月13 ?       00:00:00 postgres: logger


root      4150  4148  1 6月13 ?       00:11:30 postgres: auditwriter


root      4152  4148  0 6月13 ?       00:00:00 postgres: checkpointer


root      4153  4148  0 6月13 ?       00:00:04 postgres: background writer


root      4154  4148  0 6月13 ?       00:00:54 postgres: stats collector


root      4156  4148  0 6月13 ?       00:00:00 postgres: audit archiver or cleanup


root      4314  4148  0 6月13 ?       00:00:04 postgres: walwriter


root      4315  4148  0 6月13 ?       00:00:01 postgres: autovacuum launcher


root      4316  4148  0 6月13 ?       00:00:00 postgres: archiver   last was 000000040000000000000008


root      4317  4148  0 6月13 ?       00:00:00 postgres: logical replication launche

（左右滑动查看完整内容）

④如果数据库post进程也查询不到，说明数据库不在运行。

2、使用集群命令查看

①集群状态正常：集群所有节点healthy=t，nodetype=STANDBY的节点streamingState=streaming表示正常。

/usr/local/db_ha/bin/db_ha select -f usr/local/db_ha/conf/db_ha.conf


connect monitor success


cluster num = 3         secondary monitor is normal


nodeip=192.168.80.228,nodetype=PRIMARY,replicationName=ha228 streamingType=NONE streamingState=none healthy=t agentState=NORMAL


nodeip=192.168.80.229,nodetype=STANDBY,replicationName=ha229 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL


nodeip=192.168.80.230,nodetype=STANDBY,replicationName=ha230 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL

（左右滑动查看完整内容）

②集群检查，228节点异常，streamingState=none healthy=f。

/usr/local/db_ha/bin/db_ha select -f usr/local/db_ha/conf/db_ha.conf
connect monitor success


cluster num = 3         secondary monitor is normal


nodeip=192.168.80.229,nodetype=PRIMARY,replicationName=ha229 streamingType=NONE streamingState=none healthy=t agentState=NORMAL


nodeip=192.168.80.228,nodetype=PRIMARY,replicationName=ha228 streamingType=NONE streamingState=none healthy=f agentState=UNUSUAL


nodeip=192.168.80.230,nodetype=STANDBY,replicationName=ha230 streamingType=ASYNC streamingState=streaming healthy=t agentState=NORMAL

（左右滑动查看完整内容）

3、pg_controldata查看数据库时间线和状态

注意：数据库各个节点时间线不一致，集群出现问题。

①如下表示主库正常运行

export LANG="en_US.UTF-8"


pg_controldata |grep -E "TimeLineID|state"


Database cluster state:               in production


Latest checkpoint's TimeLineID:       4


Latest checkpoint's PrevTimeLineID:   4

（左右滑动查看完整内容）

②如下表示备库正常运行

export LANG="en_US.UTF-8"


pg_controldata |grep -E "TimeLineID|state"


Database cluster state:               in archive recovery


Latest checkpoint's TimeLineID:       4


Latest checkpoint's PrevTimeLineID:   4

（左右滑动查看完整内容）

4、查看备库标志standby.signal

ll /opt/HighGo4.5.7-see/data/standby.signal


-rw------- 1 root root 18 6月  13 15:45 /opt/HighGo4.5.7-see/data/standby.signal

（左右滑动查看完整内容）

集群服务器 postgresql

文章转载自瀚高PG实验室，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

db_ha集群安装后的自动切换及切换后的判断步骤说明文档

评论