PostgreSQL主从集群中repmgr的管理操作

IT那活儿 2024-10-28

369

点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享，不管IT什么活儿，干就完了！！！

基于repmgr搭建的PostgreSQL集群日常维护过程中需要对集群进行状态检查、主从切换以及集群配置（重置/新增从库）等操作。

本文主要介绍常用的repmgr维护操作。

什么是repmgr

repmgr是一个开源工具套件，用于管理PostgreSQL服务器集群中的复制和故障转移。它通过设置备用服务器、监视复制和执行管理任务（如故障转移或手动切换操作）的工具增强了PostgreSQL的内置热备份功能。

自从PostgreSQL的内置复制机制在9.0中引入以来，repmgr就为其提供了高级支持。当前的repmgr系列repmgr 5支持PostgreSQL 9.3以及后续版本的主从复制，如级联复制、时间线切换和通过复制协议进行基础备份。

Repmgr安装

2.1 编译安装

tar -zxvf repmgr-5.4.1.tar.gz

chown -R postgres:postgres /tmp/repmgr-5.4.1
su - postgres
cd /tmp/repmgr-5.4.1
./configure make install

验证安装成功：

repmgr --help

2.2 修改配置文件

node_id=1
node_name='pg1'
conninfo='host=pg1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/postgres/pgdata'
failover=automatic
promote_command='/postgres/pgsql-16/bin/repmgr standby promote -f etc/repmgr/repmgr.conf --log-to-file'
follow_command='/postgres/pgsql-16/bin/repmgr standby follow -f /etc/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command  = 'pg_ctl start'
service_stop_command    = 'pg_ctl stop'
service_restart_command = 'pg_ctl restart'
service_reload_command  = 'pg_ctl reload -D /postgres/pgdata'
repmgrd_pid_file='/tmp/repmgrd.pid'
log_file='/tmp/repmgrd.log'
priority=100

2.3 注册数据库

repmgr primary register -f /etc/repmgr/repmgr.conf
repmgr standby register -f /etc/repmgr/repmgr.conf

2.4 查看集群节点状态

[postgres@pg2 ~]$ repmgr cluster show -f /etc/repmgr/repmgr.conf --verbose
NOTICE: using provided configuration file "/etc/repmgr/repmgr.conf"
INFO: connecting to database
ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------------------------
1  | pg1 | primary | * running | | default  | 100      | 1        | host=192.168.XXX.130 user=repmgr dbname=repmgr connect_timeout=2
2  | pg2 | standby | running | pg1 | default  | 100      | 1        | host=192.168.XXX.131 user=repmgr dbname=repmgr connect_timeout=2

可以看到数据库的node ID、节点名称、节点角色（主/从）、状态和连接串等信息。

提升从库为主库

主库宕机后，可手动将从库提升为主库进行业务接管，保证业务连续性。

[postgres@pg2 ~]$ repmgr cluster show -f /etc/repmgr/repmgr.conf --verbose
 NOTICE: using provided configuration file "/etc/repmgr/repmgr.conf" 
INFO: connecting to database ERROR: connection to database failed DETAIL: connection to server at "192.168.XXX.130", port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? 
DETAIL: attempted to connect using: user=repmgr connect_timeout=2 dbname=repmgr host=192.168.XXX.130 fallback_application_name=repmgr options=-csearch_path= 
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string 
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------- 
1 | pg1 | primary | - failed | ? | default | 100 | | host=192.168.XXX.130 user=repmgr dbname=repmgr connect_timeout=2 2 | pg2 | primary | * running | | default | 100 | 2 | 
host=192.168.XXX.131 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - when attempting to connect to node "pg1" (ID: 1), 
following error encountered : "connection to server at "192.168.XXX.130", port 5432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections?"

克隆主库数据到从库

repmgr可以再从库把主库的数据拉取到从库的pgdata目录，之后起从库注册到集群即可完成集群配置。

4.1 测试不执行

/postgres/pgsql-16/bin/repmgr -h pg1 -U repmgr -d repmgr -f /etc/repmgr/repmgr.conf standby clone --dry-run

4.2 执行克隆操作

/postgres/pgsql-16/bin/repmgr -h pg1 -U repmgr -d repmgr -f /etc/repmgr/repmgr.conf standby clone

4.3 注册从库到集群

repmgr standby register -f /etc/repmgr/repmgr.conf

注册从库到集群之后，可以通过查看集群状态命令查看集群中各个节点的状态。

另外，目前版本的repmgr不提供VIP功能，可使用keepalived等工具配置VIP服务，判断PostgreSQL存活条件可以查看postgresql主进程的数量，为0则切换VIP到从库，为1则保持现状。

END

本文作者：高强(上海新炬中北团队）

本文来源：“IT那活儿”公众号

集群服务器集群技术 host postgresql

文章转载自IT那活儿，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。