适用范围

postgresql 12 and later

目的

repmgr主从高可用配置、switchover切换和failover切换

解决方案

一、repmgr简介

repmgr是一套开源工具，用于管理PostgreSQL服务器群集内的复制和故障转移。它支持并增强了PostgreSQL的内置流复制，该复制流提供了一个读/写主服务器以及一个或多个只读备用数据库，其中包含主服务器数据库的近实时副本。可以设置热备份服务器、监控复制、执行管理任务（故障转移、手工切换等）。

repmgr高可用术语

replication cluster复制集群:“复制群集”是指通过流复制连接的PostgreSQL服务器网络。
Node :节点是复制群集中的单个PostgreSQL服务器。
upstream node ：上游节点备用服务器连接到的节点，以接收流复制。这是主服务器，或者在级联复制的情况下，是另一个备用服务器。
Failover：故障转移如果主服务器发生故障并且合适的备用服务器被提升为新的主服务器，则会发生此操作。该repmgrd守护进程支持自动切换到停机时间最小化。
Switchover：切换在某些情况下，例如硬件或操作系统维护，有必要使主服务器脱机；在这种情况下，有必要进行受控的切换，从而促进适当的备用数据库并以受控的方式从复制群集中删除现有的主数据库。所述repmgr命令行客户机提供此功能。
Fencing ：击剑在故障转移情况下，升级新的备用数据库后，至关重要的是，先前的主数据库不要意外地恢复联机，这会导致脑裂的情况。为了防止这种情况，应该将发生故障的主数据库与应用程序隔离开，即隔离。
witness server ：见证服务器 repmgr提供了设置所谓的“见证服务器”的功能，以协助在故障转移情况下确定具有多个备用服务器的新主服务器。见证服务器本身不是复制群集的一部分，尽管它确实包含repmgr元数据架构的副本。
见证服务器的目的是提供“强制投票”，将复制群集中的服务器拆分到多个位置。如果位置之间的连接丢失，则见证服务器的存在或不存在将决定该位置的服务器是否升级为主服务器。这是为了防止出现“裂脑”的情况，在这种情况下，隔离的位置会将网络中断解释为（远程）主节点的故障并升级为（本地）备用节点。仅当使用repmgrd时，才需要创建见证服务器。

repmgr高可用命令简介

repmgr primary register 安装pg的repmgr扩展并注册为主节点 repmgr primary unregister 注销不活动的主节点 repmgr standby clone 从其他节点复制数据到从节点 repmgr standby register 注册从节点（添加从的信息到repmgr元数据） repmgr standby unregister repmgr元数据中移除从的信息 repmgr standby promote 将从提升为主 repmgr standby follow 将从跟随新主 repmgr standby switchover 将从提升为主并将主降级为从 repmgr witness register 注册一个观察节点 repmgr witness unregister 移除一个观察节点 repmgr node status 显示节点的基本信息和复制状态 repmgr node check 从复制的角度对节点进行健康监测 repmgr node rejoin 重新加入一个失效节点到集群 repmgr cluster show 显示所有集群中注册的节点信息 repmgr cluster matrix 所有节点运行show并汇总 repmgr cluster crosscheck 节点间两两交叉监测连接 repmgr cluster event 输出时间记录 repmgr cluster cleanup 清理监控历史

repmgr主要工具repmgr、repmgrd：

repmgr是一个用于执行管理任务的命令行工具，主要用来设置备用服务器，切换主服务器和备服务器，显示复制群集中的服务器状态。
repmgrd是一个守护程序，它主动监视复制集群中的服务器并监控和记录复制性能，通过检测主服务器的故障并选择最合适的备用服务器来执行自动主备库切换。

repmgr高可用主从配置

二、repmgr高可用主从配置

1、主库、备库上传repmgr安装包

[root@ora11g-node01 postgres]# ls -l repmgr-master.zip 
-rw-r--r-- 1 root root 561946 Jun 20 23:02 repmgr-master.zip
[root@ora11g-node01 repmgr-master]# unzip repmgr-master.zip

进入解压目录安装

[postgres@ora11g-node01 repmgr-master]$ ./configure --prefix=/usr/share/postgresql-12.5
[postgres@ora11g-node01 repmgr-master]$ make
[postgres@ora11g-node01 repmgr-master]$ make install

2、主库、备库设置pg_hba.conf文件（因为这里使用的是replication进行流复制同步，然后repmgr进行repmgr）

[postgres@ora11g-node01 12]$ more /data/pgsql/pg_hba.conf 
。。。。。
host    repmgr          repmgr          192.168.56.50/32         trust
host    repmgr          repmgr          192.168.56.51/32         trust
host    replication     repmgr          192.168.56.50/32         trust
host    replication     repmgr          192.168.56.51/32         trust
host    all             all             0.0.0.0/0                md5

[postgres@ora11g-node01 12]$ pg_ctl  reload
server signaled

3、主库创建repmgr用户和repmgr数据库

createuser -s repmgr
createdb repmgr -O repmgr
alter user repmgr superuser;

4、备库关闭postgresql执行repmgr进行热克隆（如果主备库已经同步，无需执行repmgr进行克隆）

repmgr -h 192.168.56.50 -Urepmgr standby clone -F -D /data/pgsql

5、主库设置repmgr.conf文件

[postgres@ora11g-node01 repmgr-master]$ mkdir -p /etc/repmgr/12
[postgres@ora11g-node01 repmgr-master]$ vi /etc/repmgr/12/repmgr.conf 
node_id=1
node_name=ora11g-node01
conninfo='host=192.168.56.50 port=5432 user=repmgr dbname=repmgr  connect_timeout=20'
data_directory='/data/pgsql'
pg_bindir='/usr/share/postgresql-12.5/bin'
replication_user='repmgr'
replication_type='physical'
use_replication_slots=yes
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command='repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/home/postgres/repmgr.log'

6、主库repmgr注册

......