PostgreSQL一主两备一监证节点自动切换主备

原创键盘丐 2022-09-01

2531

概述

本实验搭建PostgreSQL一主两备一监证节点群集，使用repmgrd监控主备并自动切换，使用PgBouncer连接池连接主库，便于用户在主备切换时无感知。

部署架构

服务器信息

服务	IP地址	主机名称	操作系统	安装服务
主节点	192.168.59.26	node1	CentOS 7.6	PostgreSQL 14.4、repmgr 5.3.1
备节点	192.168.59.27	node2	CentOS 7.6	PostgreSQL 14.4、repmgr 5.3.1
备节点	192.168.59.28	node3	CentOS 7.6	PostgreSQL 14.4、repmgr 5.3.1
监证节点	192.168.59.29	node4	CentOS 7.6	PostgreSQL 14.4、repmgr 5.3.1 Pgbouncer 1.17.0

安装过程

1.四台主机关闭防火墙操作

如下操作在node1、node2、node3、node4都执行并重启

[root@node1 ~]# systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@node1 ~]# systemctl stop firewalld.service 
[root@node1 ~]# systemctl status firewalld.service 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Aug 25 11:38:51 node1 systemd[1]: Starting firewalld - dynamic firewall daemon...
Aug 25 11:38:56 node1 systemd[1]: Started firewalld - dynamic firewall daemon.
Aug 25 11:54:09 node1 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Aug 25 11:54:09 node1 systemd[1]: Stopped firewalld - dynamic firewall daemon.
[root@node1 ~]# sed -i '/^SELINUX=/d' /etc/selinux/config 
[root@node1 ~]# echo "SELINUX=disabled" >> /etc/selinux/config 
[root@node1 ~]# cat /etc/selinux/config|grep "SELINUX=disabled" 
SELINUX=disabled

2.配置四台主机postgres账号ssh互信

根据此篇脚本配置：多台Linux主机一键配置SSH互信

3.规划安装目录

以下目录需四台主机都创建

mkdir -p /opt/postgres/logs #日志目录

mkdir -p /opt/postgres/pgsql/ #PostgreSQL软件安装目录

mkdir -p /opt/postgres/data #PostgreSQL数据库目录

mkdir -p /opt/postgres/pgsql/etc/ #配置文件目录

chown postgres:postgres /opt/postgres/ -R #修改目录所属用户

在监证节点node4上还需创建Pgbouncer安装目录

mkdir -p /opt/postgres/pgbouncer #pgbouncer安装目录

mkdir -p /opt/postgres/pgbouncer/etc #pgbouncer配置文件目录

chown postgres:postgres /opt/postgres/pgbouncer/ -R #修改目录所属用户

4.软件下载

PostgreSQL-14.4：https://ftp.postgresql.org/pub/source/v14.4/postgresql-14.4.tar.gz

Pgbouncer 1.17.0：http://www.pgbouncer.org/downloads/files/1.17.0/pgbouncer-1.17.0.tar.gz

repmgr-5.3.1：https://codeload.github.com/EnterpriseDB/repmgr/zip/refs/heads/master

5.软件安装

node1、node2、node3、node4节点如下步骤安装PostgreSQL-14.4和repmgr-5.3.1

PostgreSQL编译安装参考此篇：PostgreSQL在Linux下的两种安装方式

repmgr编译安装

使用root用户执行如下命令安装编译依赖包

yum install -y yum-utils openjade docbook-dtds docbook-dsssl docbook-style-l flex libselinux-devel libxml2-devel libxslt-devel openssl-devel pam-devel readline-devel unzip zlib-devel readline-devel python perl perl-ExtUtils-Embed gcc python-devel

编译安装repmgr过程

[postgres@node1 repmgr-master]$ ./configure 
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /opt/postgres/pgsql/bin/pg_config
configure: building against PostgreSQL 14.4
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h
[postgres@node1 repmgr-master]$ make && make install
Building against PostgreSQL 14
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fPIC -std=gnu89 -I/opt/postgres/pgsql/include/internal -I/opt/postgres/pgsql/include -Wall -Wmissing-prototypes -Wmissing-declarations  -I. -I./ -I/opt/postgres/pgsql/include/server -I/opt/postgres/pgsql/include/internal  -D_GNU_SOURCE   -c -o repmgr.o repmgr.c

.
..
...
/bin/mkdir -p '/opt/postgres/pgsql/lib'
/bin/mkdir -p '/opt/postgres/pgsql/share/extension'
/bin/mkdir -p '/opt/postgres/pgsql/share/extension'
/bin/mkdir -p '/opt/postgres/pgsql/bin'
/bin/install -c -m 755  repmgr.so '/opt/postgres/pgsql/lib/repmgr.so'
/bin/install -c -m 644 .//repmgr.control '/opt/postgres/pgsql/share/extension/'
/bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql .//repmgr--5.3--5.4.sql .//repmgr--5.4.sql  '/opt/postgres/pgsql/share/extension/'
/bin/install -c -m 755 repmgr repmgrd '/opt/postgres/pgsql/bin/'

6.配置主库

在node1节点上配置主库

创建用于主从复制的repmgr用户和数据库

[postgres@node1 postgres]$ initdb 
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /opt/postgres/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... PRC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /opt/postgres/data -l logfile start

[postgres@node1 postgres]$ pg_ctl start
waiting for server to start....2022-08-25 15:31:49.047 CST [9791] LOG:  starting PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2022-08-25 15:31:49.047 CST [9791] LOG:  listening on IPv6 address "::1", port 5432
2022-08-25 15:31:49.047 CST [9791] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-08-25 15:31:49.049 CST [9791] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-25 15:31:49.051 CST [9792] LOG:  database system was shut down at 2022-08-25 15:31:43 CST
2022-08-25 15:31:49.053 CST [9791] LOG:  database system is ready to accept connections
 done
server started
[postgres@node1 postgres]$ psql
psql (14.4)
Type "help" for help.

postgres=# create user repmgr with superuser replication;
CREATE ROLE
postgres=# alter user repmgr with password 'repmgr';
ALTER ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE

修改pg_hba.conf文件

[postgres@node1 data]$ cat pg_hba.conf 
# PostgreSQL Client Authentication Configuration File
# ===================================================
.
..
...
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    repmgr          all             192.168.59.26/32        trust
host    repmgr          all             192.168.59.27/32        trust
host    repmgr          all             192.168.59.28/32        trust
host    repmgr          all             192.168.59.29/32        trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     all             192.168.59.26/32        trust
host    replication     all             192.168.59.27/32        trust
host    replication     all             192.168.59.28/32        trust
host    replication     all             192.168.59.29/32        trust

修改postgresql.conf的如下参数

listen_addresses = '*'
port = 5432
max_connections = 1000
wal_log_hints = on
logging_collector = on
log_destination = 'stderr'  
log_directory = '/opt/postgres/logs'
log_filename = 'postgresql-%Y-%m-%d.log'
shared_preload_libraries = 'repmgr'

重启PostgreSQL数据库使修改的参数生效

[postgres@node1 data]$ pg_ctl stop
waiting for server to shut down....2022-08-25 15:40:27.254 CST [9821] LOG:  received fast shutdown request
2022-08-25 15:40:27.255 CST [9821] LOG:  aborting any active transactions
2022-08-25 15:40:27.256 CST [9821] LOG:  background worker "logical replication launcher" (PID 9828) exited with exit code 1
2022-08-25 15:40:27.256 CST [9823] LOG:  shutting down
2022-08-25 15:40:27.266 CST [9821] LOG:  database system is shut down
 done
server stopped
[postgres@node1 data]$ pg_ctl start
waiting for server to start....2022-08-25 15:40:31.499 CST [9835] LOG:  redirecting log output to logging collector process
2022-08-25 15:40:31.499 CST [9835] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started

生成repmgr.conf文件

[postgres@node1 postgres]$ cp /opt/postgres/repmgr-master/repmgr.conf.sample /opt/postgres/pgsql/etc/repmgr.conf

修改repmgr.conf的参数如下

###################################################
# repmgr sample configuration file
###################################################
node_id=1       # A unique integer greater than zero
node_name='node1'       # An arbitrary (but unique) string; we recommend
conninfo='host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2'       # Database connection information as a conninfo string.
data_directory='/opt/postgres/data'     # The node's data directory. This is needed by repmgr
config_directory='/opt/postgres/data'     # If configuration files are located outside the data
log_level='INFO'     # Log level: possible values are DEBUG, INFO, NOTICE,
log_facility='STDERR'     # Logging facility: possible values are STDERR, or for
log_file='/opt/postgres/logs/repmgr-node1.log'       # STDERR can be redirected to an arbitrary file
#------------------------------------------------------------------------------
# Environment/command settings
#------------------------------------------------------------------------------
pg_bindir='/opt/postgres/pgsql/bin'        # Path to PostgreSQL binary directory (location
repmgr_bindir='/opt/postgres/pgsql/bin'      # Path to repmgr binary directory (location of the repmgr
#------------------------------------------------------------------------------
# external command options
#------------------------------------------------------------------------------
priority=3        # indicates a preferred priority for promoting nodes;
reconnect_attempts=6      # Number of attempts which will be made to reconnect to an unreachable
reconnect_interval=10      # Interval between attempts to reconnect to an unreachable

node1节点注册为主服务

[postgres@node1 etc]$ repmgr primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@node1 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 3        | 1        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2

开启node1的repmgr主库守护进程并查看日志

[postgres@node1 etc]$ ps -uax|grep repmgrd
postgres  9862  0.0  0.0 112816   984 pts/2    R+   15:46   0:00 grep --color=auto repmgrd
[postgres@node1 etc]$ repmgrd -d
[2022-08-25 15:46:31] [NOTICE] redirecting logging output to "/opt/postgres/logs/repmgr-node1.log"

[postgres@node1 etc]$ cat /opt/postgres/logs/repmgr-node1.log 
[2022-08-25 15:46:31] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-08-25 15:46:31] [INFO] connecting to database "host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2022-08-25 15:46:31] [NOTICE] starting monitoring of node "node1" (ID: 1)
[2022-08-25 15:46:31] [INFO] "connection_check_type" set to "ping"
[2022-08-25 15:46:31] [NOTICE] monitoring cluster primary "node1" (ID: 1)
[postgres@node1 etc]$ ps -uax|grep repmgrd
postgres  9866  0.0  0.0  13360  1056 ?        S    15:46   0:00 repmgrd -d
postgres  9884  0.0  0.0 112816   984 pts/2    S+   15:46   0:00 grep --color=auto repmgrd

7.配置备库

在node2、node3上配置备库，如下操作node2、node3节点都需要执行

修改/opt/postgres/data/pg_hba.conf文件

[postgres@node2 data]$ cat pg_hba.conf 
# PostgreSQL Client Authentication Configuration File
# ===================================================
.
..
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    repmgr          all             192.168.59.26/32        trust
host    repmgr          all             192.168.59.27/32        trust
host    repmgr          all             192.168.59.28/32        trust
host    repmgr          all             192.168.59.29/32        trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     all             192.168.59.26/32        trust
host    replication     all             192.168.59.27/32        trust
host    replication     all             192.168.59.28/32        trust
host    replication     all             192.168.59.29/32        trust

修改/opt/postgres/data/postgresql.conf的如下参数

listen_addresses = '*'
port = 5432
max_connections = 1000
wal_log_hints = on
logging_collector = on
log_destination = 'stderr'  
log_directory = '/opt/postgres/logs'
log_filename = 'postgresql-%Y-%m-%d.log'
shared_preload_libraries = 'repmgr'

重启PostgreSQL数据库使修改的参数生效

[postgres@node2 data]$ pg_ctl stop
waiting for server to shut down.... done
server stopped
[postgres@node2 data]$ pg_ctl start
waiting for server to start....2022-08-25 15:54:09.434 CST [7777] LOG:  redirecting log output to logging collector process
2022-08-25 15:54:09.434 CST [7777] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started

从样例中生成repmgr.conf文件

[postgres@node3 etc]$ cp /opt/postgres/repmgr-master/repmgr.conf.sample /opt/postgres/pgsql/etc/repmgr.conf

修改node2的repmgr.conf如下：

###################################################
# repmgr sample configuration file
###################################################
node_id=2   
node_name='node2'     
conninfo='host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2'      
data_directory='/opt/postgres/data'    
config_directory='/opt/postgres/data'    
log_level='INFO'     
log_facility='STDERR'    
log_file='/opt/postgres/logs/repmgr-node2.log'  
#------------------------------------------------------------------------------
# Environment/command settings
#------------------------------------------------------------------------------
pg_bindir='/opt/postgres/pgsql/bin'      
repmgr_bindir='/opt/postgres/pgsql/bin'   
#------------------------------------------------------------------------------
# external command options
#------------------------------------------------------------------------------
priority=2  
reconnect_attempts=6  
reconnect_interval=10

修改node3节点repmgr.conf参数如下：

###################################################
# repmgr sample configuration file
###################################################
node_id=3   
node_name='node3'     
conninfo='host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2'      
data_directory='/opt/postgres/data'    
config_directory='/opt/postgres/data'    
log_level='INFO'     
log_facility='STDERR'    
log_file='/opt/postgres/logs/repmgr-node3.log'  
#------------------------------------------------------------------------------
# Environment/command settings
#------------------------------------------------------------------------------
pg_bindir='/opt/postgres/pgsql/bin'      
repmgr_bindir='/opt/postgres/pgsql/bin'   
#------------------------------------------------------------------------------
# external command options
#------------------------------------------------------------------------------
priority=2  
reconnect_attempts=6  
reconnect_interval=10

node2、node3上分别关闭数据库

[postgres@node2 etc]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

在node2、node3使用--dry-run选项检查待机是否可以克隆

[postgres@node2 etc]$ repmgr -h 192.168.59.26 -U repmgr -d repmgr standby clone --dry-run
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/opt/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.59.26 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: "repmgr" extension is installed in database "repmgr"
WARNING: target data directory appears to be a PostgreSQL data directory
DETAIL: target data directory is "/opt/postgres/data"
HINT: use -F/--force to overwrite the existing data directory
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /opt/postgres/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /opt/postgres/data -h 192.168.59.26 -p 5432 -U repmgr -X stream 
INFO: all prerequisites for "standby clone" are met

在node2、node3上执行如下操作克隆node1节点主数据库

[postgres@node2 etc]$ repmgr -h 192.168.59.26 -U repmgr -d repmgr standby clone -F
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/opt/postgres/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.59.26 user=repmgr dbname=repmgr
DETAIL: current installation size is 33 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/opt/postgres/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/opt/postgres/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /opt/postgres/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /opt/postgres/data -h 192.168.59.26 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /opt/postgres/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

在node2、node3上启动备库，并注册为standby

[postgres@node2 etc]$ pg_ctl start
waiting for server to start....2022-08-25 16:03:49.805 CST [7810] LOG:  redirecting log output to logging collector process
2022-08-25 16:03:49.805 CST [7810] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started
[postgres@node2 etc]$ repmgr standby register
INFO: connecting to local node "node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node2" (ID: 2) successfully registered

在node2上启动repmgr守护进程，查看主从信息

[postgres@node2 etc]$ ps -uax|grep repmgrd
postgres  7863  0.0  0.0 112816   980 pts/1    R+   16:06   0:00 grep --color=auto repmgrd
[postgres@node2 etc]$ repmgrd -d
[2022-08-25 16:06:15] [NOTICE] redirecting logging output to "/opt/postgres/logs/repmgr-node2.log"

[postgres@node2 etc]$ cat /opt/postgres/logs/repmgr-node2.log 
[2022-08-25 16:06:15] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-08-25 16:06:15] [INFO] connecting to database "host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2022-08-25 16:06:15] [NOTICE] starting monitoring of node "node2" (ID: 2)
[2022-08-25 16:06:15] [INFO] "connection_check_type" set to "ping"
[2022-08-25 16:06:15] [INFO] monitoring connection to upstream node "node1" (ID: 1)
[postgres@node2 etc]$ ps -aux|grep repmgrd
postgres  7867  0.0  0.0  13360  1104 ?        S    16:06   0:00 repmgrd -d
postgres  7880  0.0  0.0 112816   984 pts/1    R+   16:06   0:00 grep --color=auto repmgrd
[postgres@node2 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 3        | 1        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 2        | 1        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2

node3上启动repmgr守护进程，查看主从信息

[postgres@node3 etc]$ pg_ctl start
waiting for server to start....2022-08-25 16:18:36.781 CST [2040] LOG:  redirecting log output to logging collector process
2022-08-25 16:18:36.781 CST [2040] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started
[postgres@node3 etc]$ repmgr standby register
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node3" (ID: 3) successfully registered
[postgres@node3 etc]$ ps -aux|grep repmgrd
postgres  2050  0.0  0.0 112816   984 pts/1    R+   16:18   0:00 grep --color=auto repmgrd
[postgres@node3 etc]$ repmgrd -d
[2022-08-25 16:19:01] [NOTICE] redirecting logging output to "/opt/postgres/logs/repmgr-node3.log"

[postgres@node3 etc]$ cat /opt/postgres/logs/repmgr-node3.log 
[2022-08-25 16:19:01] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-08-25 16:19:01] [INFO] connecting to database "host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2022-08-25 16:19:01] [NOTICE] starting monitoring of node "node3" (ID: 3)
[2022-08-25 16:19:01] [INFO] "connection_check_type" set to "ping"
[2022-08-25 16:19:01] [INFO] monitoring connection to upstream node "node1" (ID: 1)
[postgres@node3 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 3        | 1        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 2        | 1        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1    | default  | 2        | 1        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
[postgres@node3 etc]$ ps -uax|grep repmgrd
postgres  2054  0.0  0.0  13360  1104 ?        S    16:19   0:00 repmgrd -d
postgres  2093  0.0  0.0 112816   984 pts/1    R+   16:20   0:00 grep --color=auto repmgrd

测主备同步

主库操作

[postgres@node1 etc]$ psql
psql (14.4)
Type "help" for help.

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)

postgres=# create database test;
CREATE DATABASE
postgres=# \c test;
You are now connected to database "test" as user "postgres".
test=# \d
Did not find any relations.
test=# create table t (id int,name varchar(50));
CREATE TABLE
test=# insert into t values (1,'deitylee');
INSERT 0 1
test=# \d
        List of relations
 Schema | Name | Type  |  Owner   
--------+------+-------+----------
 public | t    | table | postgres
(1 row)

test=# select * from t;
 id |   name   
----+----------
  1 | deitylee
(1 row)

test=#

备库可以查询到主库创建的表和写入的数据，主备同步建立成功

[postgres@node2 ~]$ psql
psql (14.4)
Type "help" for help.

postgres=# \d
Did not find any relations.
postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 test      | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(5 rows)

postgres=# \c test;
You are now connected to database "test" as user "postgres".
test=# \d
        List of relations
 Schema | Name | Type  |  Owner   
--------+------+-------+----------
 public | t    | table | postgres
(1 row)

test=# select * from t;
 id |   name   
----+----------
  1 | deitylee
(1 row)

8.配置监证节点

node4上已经安装好PostgreSQL、repmgr软件，做如下配置修改

先初始化数据库

[postgres@node4 postgres]$ initdb 
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /opt/postgres/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... PRC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /opt/postgres/data -l logfile start

修改/opt/postgres/data/pg_hba.conf文件

[postgres@node4 data]$ cat pg_hba.conf 
# PostgreSQL Client Authentication Configuration File
# ===================================================
.
..
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    repmgr          all             192.168.59.26/32        trust
host    repmgr          all             192.168.59.27/32        trust
host    repmgr          all             192.168.59.28/32        trust
host    repmgr          all             192.168.59.29/32        trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     all             192.168.59.26/32        trust
host    replication     all             192.168.59.27/32        trust
host    replication     all             192.168.59.28/32        trust
host    replication     all             192.168.59.29/32        trust

启动数据库服务

[postgres@node4 data]$ pg_ctl start
waiting for server to start....2022-08-22 15:18:40.035 CST [2873] LOG:  starting PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2022-08-22 15:18:40.036 CST [2873] LOG:  listening on IPv6 address "::1", port 5432
2022-08-22 15:18:40.036 CST [2873] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-08-22 15:18:40.037 CST [2873] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-22 15:18:40.040 CST [2874] LOG:  database system was shut down at 2022-08-22 15:17:48 CST
2022-08-22 15:18:40.041 CST [2873] LOG:  database system is ready to accept connections
 done
server started

创建repmgr账号及数据库

[postgres@node4 data]$ psql 
psql (14.4)
Type "help" for help.

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

postgres=# create user repmgr with superuser replication;
CREATE ROLE
postgres=# create database repmgr owner repmgr
postgres-# ;
CREATE DATABASE

修改/opt/postgres/data/postgresql.conf的如下参数

listen_addresses = '*'
port = 5432
log_destination = 'stderr'
logging_collector = on
log_directory = '/opt/postgres/logs'
log_filename = 'postgresql-%Y-%m-%d.log'
shared_preload_libraries = 'repmgr

重启PostgreSQL数据库使修改的参数生效

[postgres@node4 data]$ pg_ctl stop
waiting for server to shut down....2022-08-25 16:30:55.359 CST [2367] LOG:  received fast shutdown request
2022-08-25 16:30:55.360 CST [2367] LOG:  aborting any active transactions
2022-08-25 16:30:55.361 CST [2367] LOG:  background worker "logical replication launcher" (PID 2374) exited with exit code 1
2022-08-25 16:30:55.361 CST [2369] LOG:  shutting down
2022-08-25 16:30:55.368 CST [2367] LOG:  database system is shut down
 done
server stopped
[postgres@node4 data]$ pg_ctl start
waiting for server to start....2022-08-25 16:30:59.518 CST [2384] LOG:  redirecting log output to logging collector process
2022-08-25 16:30:59.518 CST [2384] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started

从样例中生成repmgr.conf文件

[postgres@node4 etc]$cp /opt/postgres/repmgr-master/repmgr.conf.sample /opt/postgres/pgsql/etc/repmgr.conf

修改node4的repmgr.conf如下：

###################################################
# repmgr sample configuration file
###################################################
node_id=4   
node_name='node4'     
conninfo='host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2'      
data_directory='/opt/postgres/data'    
config_directory='/opt/postgres/data'    
log_level='INFO'     
log_facility='STDERR'    
log_file='/opt/postgres/logs/repmgr-node4.log'  
#------------------------------------------------------------------------------
# Environment/command settings
#------------------------------------------------------------------------------
pg_bindir='/opt/postgres/pgsql/bin' 
repmgr_bindir='/opt/postgres/pgsql/bin'      
#------------------------------------------------------------------------------
# external command options
#------------------------------------------------------------------------------
monitoring_history=true
primary_visibility_consensus=true

node4重启数据库服务

[postgres@node4 etc]$ pg_ctl stop
waiting for server to shut down.... done
server stopped
[postgres@node4 etc]$ pg_ctl start
waiting for server to start....2022-08-22 15:26:30.934 CST [2908] LOG:  redirecting log output to logging collector process
2022-08-22 15:26:30.934 CST [2908] HINT:  Future log output will appear in directory "/opt/postgres/logs".
 done
server started

注册node4节点为监证节点

[postgres@node4 etc]$ repmgr witness register -h 192.168.59.26 -d repmgr
INFO: connecting to witness node "node4" (ID: 4)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "node4" (ID: 4) successfully registered

在node4节点上查看群集信息

[postgres@node4 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 3        | 1        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 2        | 1        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1    | default  | 2        | 1        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 4  | node4 | witness | * running | node1    | default  | 0        | n/a      | host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2

在node4上启动repmgr守护进程

[postgres@node4 etc]$ ps -uax|grep repmgrd
postgres  2474  0.0  0.0 112816   984 pts/1    R+   16:39   0:00 grep --color=auto repmgrd
[postgres@node4 etc]$ repmgrd -d
[2022-08-25 16:39:17] [NOTICE] redirecting logging output to "/opt/postgres/logs/repmgr-node4.log"

[postgres@node4 etc]$ cat /opt/postgres/logs/repmgr-node4.log 

[2022-08-25 16:39:17] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-08-25 16:39:17] [INFO] connecting to database "host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2022-08-25 16:39:17] [NOTICE] starting monitoring of node "node4" (ID: 4)
[2022-08-25 16:39:17] [INFO] "connection_check_type" set to "ping"
[2022-08-25 16:39:17] [INFO] witness monitoring connection to primary node "node1" (ID: 1)
[postgres@node4 etc]$ ps -uax|grep repmgrd
postgres  2478  0.0  0.0  13360  1112 ?        S    16:39   0:00 repmgrd -d
postgres  2506  0.0  0.0 112816   984 pts/1    R+   16:39   0:00 grep --color=auto repmgrd

9.安装pgbouncer配置连接池

在node4上安装pgbouncer连接池

编译安装pgbouncer

[postgres@node4 postgres]$ tar -zxvf pgbouncer-1.17.0.tar.gz 
pgbouncer-1.17.0/
pgbouncer-1.17.0/win32/
.
..
...
pgbouncer-1.17.0/src/server.c
[postgres@node4 postgres]$ cd pgbouncer-1.17.0
[postgres@node4 pgbouncer-1.17.0]$ ls
AUTHORS  autogen.sh  config.guess  config.mak.in  config.sub  configure  configure.ac  COPYRIGHT  doc  etc  include  install-sh  lib  Makefile  NEWS.md  README.md  src  test  win32
[postgres@node4 pgbouncer-1.17.0]$ ./configure --prefix=/opt/postgres/pgbouncer
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target host type... unix
.
..
...
checking whether to fail on warnings... no
configure: creating ./config.status
config.status: creating config.mak
config.status: creating lib/usual/config.h

Results:
  adns    = c-ares
  pam     = no
  systemd = no
  tls     = yes
  
[postgres@node4 pgbouncer-1.17.0]$ make && make install
     CC       src/admin.c
     CC       src/client.c
     CC       src/dnslookup.c
src/dnslookup.c: In function ‘impl_launch_query’:
src/dnslookup.c:905:2: warning: #warning c-ares <=1.10 has buggy IPv6 support; this PgBouncer build will use IPv4 only. [-Wcpp]
 #warning c-ares <=1.10 has buggy IPv6 support; this PgBouncer build will use IPv4 only.
  ^
     CC       src/hba.c
    .
    ..
    ...
     INSTALL  pgbouncer /opt/postgres/pgbouncer/bin
     INSTALL  README.md /opt/postgres/pgbouncer/share/doc/pgbouncer
     INSTALL  NEWS.md /opt/postgres/pgbouncer/share/doc/pgbouncer
     INSTALL  etc/pgbouncer.ini /opt/postgres/pgbouncer/share/doc/pgbouncer
     INSTALL  etc/userlist.txt /opt/postgres/pgbouncer/share/doc/pgbouncer
     INSTALL  doc/pgbouncer.1 /opt/postgres/pgbouncer/share/man/man1
     INSTALL  doc/pgbouncer.5 /opt/postgres/pgbouncer/share/man/man5

从示例复制pgbouncer.ini和userlist.txt文件并修改权限

[postgres@node4 pgbouncer]$ cp /opt/postgres/pgbouncer/share/doc/pgbouncer/pgbouncer.ini /opt/postgres/pgbouncer/etc/
[postgres@node4 pgbouncer]$ cp /opt/postgres/pgbouncer/share/doc/pgbouncer/userlist.txt /opt/postgres/pgbouncer/etc/
[postgres@node4 pgbouncer]$ cd etc
[postgres@node4 etc]$ ls
pgbouncer.ini  userlist.txt
[postgres@node4 etc]$ vi pgbouncer.ini 
[postgres@node4 etc]$ chmod 700 userlist.txt

修改pgbouncer.ini参数如下

[databases]
masterdb = host=192.168.59.26 user=repmgr port=5432 dbname=repmgr
[pgbouncer]
logfile = /opt/postgres/logs/pgbouncer.log
pidfile = /opt/postgres/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = trust
auth_file = /opt/postgres/pgbouncer/etc/userlist.txt
admin_users = pgbouncer
pool_mode = session
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5

userlist.txt内容如下

"marko" "asdasd"
"postgres" "postgres"
"pgbouncer" "pgbouncer"

启动pgbouncer服务

[postgres@node4 bin]$ ./pgbouncer -d /opt/postgres/pgbouncer/etc/pgbouncer.ini 
[postgres@node4 bin]$ ps -uax|grep pgbouncer
postgres  9968  0.0  0.0  43452  1616 ?        S    16:00   0:00 ./pgbouncer -d /opt/postgres/pgbouncer/etc/pgbouncer.ini
postgres  9973  0.0  0.0 112816   980 pts/0    S+   16:00   0:00 grep --color=auto pgbouncer

连接测试连接的名称为pgbouncer.ini定义的masterdb名称

[postgres@node4 etc]$ psql -p 6432 -h 192.168.59.29 masterdb
psql (14.4)
Type "help" for help.

masterdb=# \c
You are now connected to database "masterdb" as user "postgres".
masterdb=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 vzoom     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(5 rows)

masterdb=#  SELECT CURRENT_USER usr,inet_server_addr() host ,inet_server_port() port;
  usr   |     host      | port 
--------+---------------+------
 repmgr | 192.168.59.26 | 5432
(1 row)

10.配置主备自动切换，并使pgbouncer连接到新的主库

编写failover.sh脚本在主备切换时使用
脚本内容如下，其中MASTER_IP在node1、node2、node3上为本地对应IP地址。

#!/bin/bash
PGHOME=/opt/postgres/pgsql
PGBIN=$PGHOME/bin
PGDATA=/opt/postgres/data
PGPORT=5432
PGUSER=postgres
LOG_FILE=/opt/postgres/logs/failover.log
BOUN_SERVER=node4
BOUN_FILE=/opt/postgres/pgbouncer/etc/pgbouncer.ini
BOUN_LISTEN_PORT=6432
BOUN_ADMIN_USER=pgbouncer
STANDBY_IP=192.168.59.*
MASTER_IP=192.168.59.26
CONN_INFO="user=repmgr port=5432 dbname=repmgr"
TIME=`date '+%Y-%m-%d %H:%M:%S'`
echo "$TIME进入备机升级为主机操作" >> $LOG_FILE
$PGBIN/repmgr standby promote >> $LOG_FILE
echo "$?切换是否成功，0表示成功！" >> $LOG_FILE
if [ $? == 0 ];then 
    echo "$? 操作是否成功？0表示成功！" >> $LOG_FILE
    IF_RECOVERY=" t"
    while [ "$IF_RECOVERY" = " t" ];do
    TIME=`date '+%Y-%m-%d %H:%M:%S'`
    echo "$TIME 查看本机postgres数据库是否为主机！" >> $LOG_FILE
    IF_RECOVERY=`psql -c "select pg_is_in_recovery()" | sed -n '3,3p'`
    echo "查询执行完！结果:|$IF_RECOVERY|" >> $LOG_FILE
    if [ "$IF_RECOVERY" == " f" ];then
        echo "$TIME FAILOVER-INFO:切换成功！">>$LOG_FILE
        TIME=`date '+%Y-%m-%d %H:%M:%S'`
        echo "$TIME 替换鉴证节点pgbouncer.ini文件内所有连接地址为本地" >> $LOG_FILE
        echo "sed -i 's/host=$STANDBY_IP/host=$MASTER_IP $CONN_INFO/g' $BOUN_FILE" | ssh $PGUSER@$BOUN_SERVER bash
        echo "$?替换是否成功，0表示成功" >> $LOG_FILE
        if [ $? == 0 ];then 
            TIME=`date '+%Y-%m-%d %H:%M:%S'`
            echo "TIME FAILVOER-INFO:更新监证节点上pgbouncer.init成功！" >> $LOG_FILE
            ssh postgres@node4 "/opt/postgres/pgbouncer/bin/pgbouncer -R -d $BOUN_FILE" &>>$LOG_FILE
            if [ $? == 0 ];then
                TIME=`date '+%Y-%m-%d %H:%M:%S'`
                echo "#################################新的连接池信息#############################################################################">>$LOG_FILE
                $PGBIN/psql -h $BOUN_SERVER -p $BOUN_LISTEN_PORT -U $BOUN_ADMIN_USER pgbouncer -c "show databases">> $LOG_FILE
                echo "############################################################################################################################">>$LOG_FILE
            else
                echo "$TIME FAILOVER-ERROR:pgbouncer重启失败!" >> $LOG_FILE
            fi
        else
            echo "$TIME FAILOVER-ERROR:更新监证节点上pgbouncer.ini失败！" >> $LOG_FILE
        fi
    else
        echo "$TIME FAILOVER-ERROR:备库还在升级到主库，稍后再试！" >> $LOG_FILE
        echo "休息10秒！" >> $LOG_FILE
        sleep 10
        echo "休息完成！" >> $LOG_FILE
    fi
    done
else
   TIME=`date '+%Y-%m-%d %H:%M:%S'`
   echo "$TIME ERROR: 备库升级为主库失败！"
fi

修改node1、node2、node3上repmgr.conf如下对应参数

其中promote_command为failover.sh的全路径

failover='automatic'                  
promote_command='/opt/postgres/etc/failover.sh'          
follow_command='repmgr standby follow -f /opt/postgres/pgsql/etc/repmgr.conf --log-to-file --upstream-node-id=%n'

重启repmgr守护进程，node1、node2、node3都需要执行

[postgres@node1 etc]$ ps -aux|grep repmgrd
postgres  9866  0.0  0.0  13360  1152 ?        S    15:46   0:01 repmgrd -d
postgres 17982  0.0  0.0 112816   984 pts/0    R+   17:17   0:00 grep --color=auto repmgrd
[postgres@node1 etc]$ kill -9 9866
[postgres@node1 etc]$ ps -aux|grep repmgrd
postgres 17999  0.0  0.0 112816   984 pts/0    R+   17:17   0:00 grep --color=auto repmgrd
[postgres@node1 etc]$ repmgrd -d
[2022-08-25 17:17:47] [NOTICE] redirecting logging output to "/opt/postgres/logs/repmgr-node1.log"

[postgres@node1 etc]$ ps -aux|grep repmgrd
postgres 18008  0.0  0.0  13360  1096 ?        S    17:17   0:00 repmgrd -d
postgres 18019  0.0  0.0 112816   984 pts/0    S+   17:17   0:00 grep --color=auto repmgrd

11.验证主备自动切换

查看群集信息并且停止主服务

[postgres@node1 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 3        | 7        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 2        | 7        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1    | default  | 2        | 7        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 4  | node4 | witness | * running | node1    | default  | 0        | n/a      | host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2
[postgres@node1 etc]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

在备节点上查看群集状态

通过几次查询可以看出，节点node2已经提升为主节点。

[postgres@node3 etc]$ repmgr cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | ? unreachable | ?        | default  | 3        |          | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | standby |   running     | ? node1  | default  | 2        | 7        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running     | ? node1  | default  | 2        | 7        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 4  | node4 | witness | * running     | ? node1  | default  | 0        | n/a      | host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 1)
  - node "node1" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)
  - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)
  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)
  - unable to connect to node "node4" (ID: 4)'s upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages
[postgres@node3 etc]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 3        |          | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 2        | 8        | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running | node2    | default  | 2        | 7        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 4  | node4 | witness | * running | node2    | default  | 0        | n/a      | host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

查看node2节点的failover日志

2022-08-26 11:10:13进入备机升级为主机操作
0切换是否成功，0表示成功！
0 操作是否成功？0表示成功！
2022-08-26 11:10:14 查看本机postgres数据库是否为主机！
查询执行完！结果:| f|
2022-08-26 11:10:14 FAILOVER-INFO:切换成功！
2022-08-26 11:10:14 替换鉴证节点pgbouncer.ini文件内所有连接地址为本地
0替换是否成功，0表示成功
TIME FAILVOER-INFO:更新监证节点上pgbouncer.init成功！
2022-08-26 11:10:15.037 CST [19535] LOG takeover_init: launching connection
2022-08-26 11:10:15.037 CST [19535] LOG S-0x1048830: pgbouncer/pgbouncer@unix:6432 new connection to server
2022-08-26 11:10:15.037 CST [19535] LOG S-0x1048830: pgbouncer/pgbouncer@unix:6432 login OK, sending SUSPEND
2022-08-26 11:10:15.037 CST [19535] LOG SUSPEND finished, sending SHOW FDS
2022-08-26 11:10:15.038 CST [19535] LOG got pooler socket: 0.0.0.0:6432
2022-08-26 11:10:15.038 CST [19535] LOG got pooler socket: [::]:6432
2022-08-26 11:10:15.038 CST [19535] LOG got pooler socket: unix:6432
2022-08-26 11:10:15.038 CST [19535] LOG SHOW FDS finished
2022-08-26 11:10:15.038 CST [19535] LOG disko over, going background
#################################新的连接池信息#############################################################################
   name    |     host      | port | database  | force_user | pool_size | min_pool_size | reserve_pool | pool_mode | max_connections | current_connections | paused | disabled
-----------+---------------+------+-----------+------------+-----------+---------------+--------------+-----------+-----------------+---------------------+--------+----------
 masterdb  | 192.168.59.27 | 5432 | repmgr    | repmgr     |        25 |             0 |            0 |           |               0 |                   0 |      0 |        0
 pgbouncer |               | 6432 | pgbouncer | pgbouncer  |         2 |             0 |            0 | statement |               0 |                   0 |      0 |        0
(2 rows)

############################################################################################################################

验证连接池链接是否正常
连接信息不变，但实际已经连接到192.168.58.27数据库上。

[postgres@node4 etc]$ psql -p 6432 -h 192.168.59.29 masterdb
psql (14.4)
Type "help" for help.

masterdb=# \d
              List of relations
 Schema |        Name        | Type  | Owner  
--------+--------------------+-------+--------
 repmgr | events             | table | repmgr
 repmgr | monitoring_history | table | repmgr
 repmgr | nodes              | table | repmgr
 repmgr | replication_status | view  | repmgr
 repmgr | show_nodes         | view  | repmgr
 repmgr | voting_term        | table | repmgr
(6 rows)

masterdb=# SELECT CURRENT_USER usr,inet_server_addr() host ,inet_server_port() port;
  usr   |     host      | port 
--------+---------------+------
 repmgr | 192.168.59.27 | 5432
(1 row)

恢复故障主节点，并注册为备机

[postgres@node1 etc]$ repmgr node rejoin -d 'host=192.168.59.27 dbname=repmgr user=repmgr' --force-rewind
NOTICE: rejoin target is node "node2" (ID: 2)
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/postgres/pgsql/bin/pg_rewind -D '/opt/postgres/data' --source-server='host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /opt/postgres/data
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/opt/postgres/pgsql/bin/pg_ctl  -w -D '/opt/postgres/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

12.报错总结

故障主机恢复后作为备机加入群集时报错
报错信息：

[postgres@node1 logs]$ repmgr node rejoin -d 'host=192.168.59.27 dbname=repmgr user=repmgr' --force-rewind
NOTICE: rejoin target is node "node2" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9008E50
ERROR: --force-rewind specified but pg_rewind cannot be used
DETAIL: "wal_log_hints" is set to "off" and data checksums are disabled

解决方案：

修改主备所有节点上/opt/postgres/data/postgres.conf中wal_log_hints为on

出现双主问题

通过failover.sh脚本切换后出现如下错误

[postgres@node1 etc]$ repmgr cluster show
 ID | Name  | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                               
----+-------+---------+----------------------+----------+----------+----------+----------+------------------------------------------------------------------
 1  | node1 | standby | ! running as primary | ? node2  | default  | 3        | 9        | host=node1 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 2  | node2 | primary | ? unreachable        | ?        | default  | 2        |          | host=node2 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 3  | node3 | standby |   running            | node1    | default  | 2        | 8        | host=node3 port=5432 dbname=repmgr user=repmgr connect_timeout=2
 4  | node4 | witness | * running            | node1    | default  | 0        | n/a      | host=node4 port=5432 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "node1" (ID: 1) is registered as standby but running as primary
  - unable to connect to node "node1" (ID: 1)'s upstream node "node2" (ID: 2)
  - unable to determine if node "node1" (ID: 1) is attached to its upstream node "node2" (ID: 2)
  - unable to connect to node "node2" (ID: 2)
  - node "node2" (ID: 2) is registered as an active primary but is unreachable

HINT: execute with --verbose option to see connection error messages

查看repmgr日志发现repmgr守护进程已经中断

[2022-08-26 12:01:01] [INFO] checking state of node "node2" (ID: 2), 5 of 6 attempts
[2022-08-26 12:01:01] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=node2 port=5432 fallback_application_name=repmgr"
[2022-08-26 12:01:01] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-08-26 12:01:01] [INFO] sleeping up to 10 seconds until next reconnection attempt
[2022-08-26 12:01:11] [INFO] checking state of node "node2" (ID: 2), 6 of 6 attempts
[2022-08-26 12:01:11] [WARNING] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=node2 port=5432 fallback_application_name=repmgr"
[2022-08-26 12:01:11] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-08-26 12:01:11] [WARNING] unable to reconnect to node "node2" (ID: 2) after 6 attempts
[2022-08-26 12:01:11] [INFO] 2 active sibling nodes registered
[2022-08-26 12:01:11] [INFO] 4 total nodes registered
[2022-08-26 12:01:11] [INFO] primary node  "node2" (ID: 2) and this node have the same location ("default")
[2022-08-26 12:01:11] [INFO] local node's last receive lsn: 0/F016E80
[2022-08-26 12:01:11] [INFO] checking state of sibling node "node3" (ID: 3)
[2022-08-26 12:01:11] [INFO] node "node3" (ID: 3) reports its upstream is node 2, last seen 51 second(s) ago
[2022-08-26 12:01:11] [INFO] standby node "node3" (ID: 3) last saw primary node 51 second(s) ago
[2022-08-26 12:01:11] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/F016E80
[2022-08-26 12:01:11] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node1" (ID: 1)
[2022-08-26 12:01:11] [INFO] node "node3" (ID: 3) has lower priority (2) than current candidate "node1" (ID: 1) (3)
[2022-08-26 12:01:11] [INFO] checking state of sibling node "node4" (ID: 4)
[2022-08-26 12:01:11] [INFO] node "node4" (ID: 4) reports its upstream is node 2, last seen 50 second(s) ago
[2022-08-26 12:01:11] [INFO] witness node "node4" (ID: 4) last saw primary node 50 second(s) ago
[2022-08-26 12:01:11] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 4 seconds
[2022-08-26 12:01:11] [NOTICE] promotion candidate is "node1" (ID: 1)
[2022-08-26 12:01:11] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2022-08-26 12:01:11] [INFO] promote_command is:
  "/opt/postgres/etc/failover.sh"
[2022-08-26 12:01:12] [INFO] checking state of node 1, 1 of 6 attempts
[2022-08-26 12:01:12] [NOTICE] node 1 has recovered, reconnecting
[2022-08-26 12:01:12] [INFO] connection to node 1 succeeded
[2022-08-26 12:01:12] [INFO] original connection is still available
[2022-08-26 12:01:12] [INFO] 2 followers to notify
[2022-08-26 12:01:12] [NOTICE] notifying node "node3" (ID: 3) to follow node 1
INFO:  node 3 received notification to follow node 1
[2022-08-26 12:01:12] [NOTICE] notifying node "node4" (ID: 4) to follow node 1
INFO:  node 4 received notification to follow node 1
[2022-08-26 12:01:12] [INFO] switching to primary monitoring mode
[2022-08-26 12:01:12] [ERROR] connection to database failed
[2022-08-26 12:01:12] [DETAIL] 
connection to server at "node2" (192.168.59.27), port 5432 failed: Connection refused
  Is the server running on that host and accepting TCP/IP connections?

[2022-08-26 12:01:12] [DETAIL] attempted to connect using:
  user=repmgr connect_timeout=2 dbname=repmgr host=node2 port=5432 fallback_application_name=repmgr options=-csearch_path=
[2022-08-26 12:01:12] [ERROR] unable connect to upstream node (ID: 2), terminating
[2022-08-26 12:01:12] [HINT] upstream node must be running before repmgrd can start
[2022-08-26 12:01:12] [INFO] repmgrd terminating...

[postgres@node1 logs]$ ps -aux|grep repmgrd
postgres  3345  0.0  0.0 112816   980 pts/0    R+   12:05   0:00 grep --color=auto repmgrd
[postgres@node1 logs]$

解决方法：

修改failover.sh切换主备的命令由原来pg_ctl promote变为repmgr standby promote

墨力计划 postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL一主两备一监证节点自动切换主备

评论