
PostgreSQL高可用-pg_auto_failover的简单使用介绍
pg_auto_failover相比其它高可用方案而言,搭建使用起来比较直观。可以很快上手。比较新版本的PG都支持multihost的连接,因此在生产环境当中使用也非常方便。Patroni, repmgr, pg_auto_failover感觉三种高可用实现架构方案相对来说,都不错。简单易用易诊断,必然会越来越受欢迎。
下边的内容就以PG14以及其对应的pg_auto_failover插件搭配安装过程及其简单验证做一个简要介绍。
基本环境:
| 主机名 | IP地址 | 基本角色 |
|---|---|---|
| centos3 | 192.168.0.20 | monitor |
| centos1 | 192.168.0.6 | 主节点 |
| centos2 | 192.168.0.10 | 从节点 |
实验步骤
1、禁用防火墙
所有节点上都执行
$ sudo setenforce 0
$ sudo sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
$ sudo systemctl disable firewalld.service
$ sudo systemctl stop firewalld.service
$ sudo iptables --flush
2、添加HOST信息
所有节点上都添加如下host信息,便于互相访问。
sudo vi /etc/hosts
192.168.0.20 centos3
192.168.0.6 centos1
192.168.0.10 centos2
3、安装PG 14
所有节点都安装上PG14。过程略。
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql14 postgresql14-contrib postgresql14-libs postgresql14-server
4、安装pg_auto_failover插件
curl https://install.citusdata.com/community/rpm.sh | sudo bash
[18:04:07-root@centos1:/iihero/tmp]$ yum search pg-auto-failover
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.huaweicloud.com
* centos-sclo-rh: mirrors.huaweicloud.com
* centos-sclo-sclo: mirrors.huaweicloud.com
* epel: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.huaweicloud.com
* updates: mirrors.huaweicloud.com
================================================================================= N/S matched: pg-auto-failover ==================================================================================
pg-auto-failover10_10-debuginfo.x86_64 : Debug information for package pg-auto-failover10_10
pg-auto-failover10_11-debuginfo.x86_64 : Debug information for package pg-auto-failover10_11
.......
pg-auto-failover13_11.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover13_12.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover14_10.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover14_11.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover14_12.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover14_13.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover15_10.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover15_11.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover15_12.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover15_13.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover16_10.x86_64 : Postgres extension for automated failover and high-availability
pg-auto-failover16_14.x86_64 : Postgres extension for automated failover and high-availability
根据上边的结果,我们可以挑一个pg-auto-failover16_14.x86_64来进行安装。不要被前边的16迷惑。16_14后边的14才是对应的PG的主版本。
因为前边已经装了PG,相应的环境变量(bash file)记得保存。
sudo yum install pg-auto-failover16_14.x86_64
安装完之后,简单的验证一下,确保路径确实是正确的。
[18:13:35-postgres@centos1:/var/lib/pgsql]$ which pg_autoctl
/usr/pgsql-14/bin/pg_autoctl
5、初始化pg_auto_failover Monitor节点
安装之初,在安装PG14一节,其实相关环境变量应该已经设好了。这里的是:
PGDATA=/var/lib/pgsql/14/data
export PGDATA
export PATH=/usr/pgsql-14/bin:$PATH
export PGPORT=5555
export LD_LIBRARY_PATH=/usr/pgsql-14/lib:$LD_LIBRARY_PATH
我们在centos3上配置一下monitor。
pg_autoctl create monitor --pgdata /var/lib/pgsql/14/monitor \
--pgport 5678 \
--hostname centos3 \
--auth trust \
--ssl-self-signed
18:23:53 2192 INFO Using default --ssl-mode "require"
18:23:53 2192 INFO Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
18:23:53 2192 WARN Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
18:23:53 2192 WARN See https://www.postgresql.org/docs/current/libpq-ssl.html for details
18:23:53 2192 INFO Initialising a PostgreSQL cluster at "/var/lib/pgsql/14/monitor"
18:23:53 2192 INFO /usr/pgsql-14/bin/pg_ctl initdb -s -D /var/lib/pgsql/14/monitor --option '--auth=trust'
18:23:54 2192 INFO /usr/bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/14/monitor/server.crt -keyout /var/lib/pgsql/14/monitor/server.key -subj "/CN=centos3"
。。。。。。。。。。。。。
18:23:54 2208 INFO Postgres controller service received signal SIGTERM, terminating
18:23:54 2208 INFO Stopping pg_autoctl postgres service
18:23:54 2208 INFO /usr/pgsql-14/bin/pg_ctl --pgdata /var/lib/pgsql/14/monitor --wait stop --mode fast
18:23:55 2192 INFO Stop pg_autoctl
注意,我上边用的是端口5678,特意区分一下。
将它启动起来,验证一下库里头的情况:
[18:26:02-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_ctl start -D ./
waiting for server to start....2023-10-22 18:26:08.823 CST [2264] LOG: redirecting log output to logging collector process
2023-10-22 18:26:08.823 CST [2264] HINT: Future log output will appear in directory "log".
done
server started
[18:26:08-postgres@centos3:/var/lib/pgsql/14/monitor]$
[18:26:13-postgres@centos3:/var/lib/pgsql/14/monitor]$ psql -p 5678
psql (14.8)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------------+----------+----------+-------------+-------------+-----------------------
pg_auto_failover | autoctl | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
postgres=# \c pg_auto_failover
You are now connected to database "pg_auto_failover" as user "postgres".
pg_auto_failover=# \dx
List of installed extensions
Name | Version | Schema | Description
----------------+---------+------------+-----------------------------------------------
btree_gist | 1.6 | public | support for indexing common datatypes in GiST
pgautofailover | 1.6 | public | pg_auto_failover
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(3 rows)
pg_auto_failover=# \du
List of roles
Role name | Attributes | Member of
--------------+------------------------------------------------------------+-----------
autoctl | | {}
autoctl_node | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
[18:29:33-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show uri --pgdata ./
Type | Name | Connection String
-------------+---------+-------------------------------
monitor | monitor | postgres://autoctl_node@centos3:5678/pg_auto_failover?sslmode=require
formation | default |
6、创建主节点(primary node)
centos1上边:
pg_autoctl create postgres \
--hostname centos1 \
--pgdata /var/lib/pgsql/14/autofailover \
--pgport 5678 \
--auth trust \
--ssl-self-signed \
--monitor 'postgres://autoctl_node@centos3:5678/pg_auto_failover?sslmode=require' \
--run &
这样跑完以后,也可以把它停了。
然后用命令:
[18:40:25-postgres@centos1:/var/lib/pgsql/14/autofailover]$ pg_autoctl run --pgdata ./ --pgport 5678
pg_autoctl run --pgdata ./ --pgport 5678
启动,这样日志会显示 :
18:40:43 2897 INFO New state for this node (node 1, "node_1") (centos1:5678): wait_primary ➜ wait_primary
我们来到Monitor节点看看状态:
[18:40:57-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 1: 0/20202A8 | read-write | wait_primary | wait_primary
7、安装从节点
centos2上:
pg_autoctl create postgres \
--hostname centos2 \
--pgdata /var/lib/pgsql/14/autofailover \
--pgport 5678 \
--auth trust \
--ssl-self-signed \
--monitor 'postgres://autoctl_node@centos3:5678/pg_auto_failover?sslmode=require' \
--run &
注意一下这里边的几个参数。pgdata, pgport之类的,它们与我们环境变量里头的值不太一样。只是为了显示更通用的情形。
我们再在monitor节点上看下整个集群下的分布情况:
[18:44:35-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 1: 0/4000148 | read-write | primary | primary
node_2 | 2 | centos2:5678 | 1: 0/4000148 | read-only | secondary | secondary
看起来,完全符合预期。centos1, centos2分别成为了主、从节点。
8、简单验证
上边的过程只是简单的把环境搭建起来。对于主从节点之间还没有进行防火墙的相关配置。我们可以分别简单的配置一下。
pg_hba.conf
hostnossl all all samenet md5
主从节点上都加上这一行。然后让其生效
[19:05:13-postgres@centos2:/var/lib/pgsql/14/autofailover]$ psql -p 5678
psql (14.8)
Type "help" for help.
postgres=#
postgres=#
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
连接验证下:(centos1访问centos2)
centos2是从节点,只能执行读操作。
[19:06:57-postgres@centos1:/var/lib/pgsql/14/autofailover]$ psql -h centos2 -p 5678
psql (14.8)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128, compression: off)
Type "help" for help.
postgres=#
postgres=# create table t(id int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
postgres=#
我们更进一步,利用libpq的多主机连接特性,试一下:
通过monitor节点得到连接信息:
[18:51:11-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show uri --formation default --pgdata ./
postgres://centos1:5678,centos2:5678/postgres?target_session_attrs=read-write&sslmode=require
我们可以利用这个串做连接信息即可:
psql postgres://centos1:5678,centos2:5678/postgres?target_session_attrs=read-write
ostgres=# create table t(id int);
CREATE TABLE
postgres=# insert into t values(1);
INSERT 0 1
postgres=#
看起来完全是透明的。
9、failover验证
模拟主节点上的PG进程停掉了
centos1:
[19:22:11-postgres@centos1:/var/lib/pgsql/14/autofailover]$ pkill postgres
[19:34:42-postgres@centos1:/var/lib/pgsql/14/autofailover]$
centos3上状态看下:
[19:35:12-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 5: 0/401B670 | read-write | primary | primary
node_2 | 2 | centos2:5678 | 5: 0/401B670 | read-only | secondary | secondary
这种情况下,它还是能够给你把相关进程拉起来。
停掉节点
如果我们将第一个节点停掉:
centos1: pg_autoctl stop --pgdata ./
centos3:
[19:22:13-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 1: 0/40165E8 | read-write ! | primary | demote_timeout
node_2 | 2 | centos2:5678 | 2: 0/4016828 | read-only | stop_replication | stop_replication
再将centos1拉起来:
[19:21:47-postgres@centos1:/var/lib/pgsql/14/autofailover]$ pg_autoctl run --pgdata ./
。。。。
19:23:32 8397 INFO New state for this node (node 1, "node_1") (centos1:5678): catchingup ➜ secondary
19:23:32 8397 INFO New state for this node (node 1, "node_1") (centos1:5678): secondary ➜ secondary
19:23:32 8397 INFO New state for node 2 "node_2" (centos2:5678): wait_primary ➜ primary
19:23:33 8397 INFO New state for node 2 "node_2" (centos2:5678): primary ➜ primary
centos3再看看状态:
[19:22:17-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 2: 0/4018748 | read-only | secondary | secondary
node_2 | 2 | centos2:5678 | 2: 0/4018748 | read-write | primary | primary
直接使用monitor节点进行failover切换
pg_auto_failover=> \q
[19:29:02-postgres@centos3:/var/lib/pgsql/14/monitor]$ psql postgres://autoctl_node@centos3:5678/pg_auto_failover
Type "help" for help.
pg_auto_failover=> select pgautofailover.perform_failover();
perform_failover
------------------
(1 row)
再看看状态:
[19:31:52-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 5: 0/401B4D8 | read-write | wait_primary | wait_primary
node_2 | 2 | centos2:5678 | 1: 0/401B310 | none | demoted | catchingup
[19:32:00-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 5: 0/401B588 | read-write | primary | primary
node_2 | 2 | centos2:5678 | 5: 0/401B588 | read-only | secondary | secondary
这种切换基本是非常优雅无缝的。
生成服务
前边都是使用长长的命令,如果你想把monitor以及各节点上的pg启动生成相应的服务。那么可以这样做:
monitor节点:
[19:35:14-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl -q show systemd --pgdata /var/lib/pgsql/14/monitor > paf.service
[19:38:31-postgres@centos3:/var/lib/pgsql/14/monitor]$ cat paf.service
[Unit]
Description = pg_auto_failover
[Service]
WorkingDirectory = /var/lib/pgsql
Environment = 'PGDATA=/var/lib/pgsql/14/monitor'
User = postgres
ExecStart = /usr/pgsql-14/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0
ExecReload = /usr/pgsql-14/bin/pg_autoctl reload
[Install]
WantedBy = multi-user.target
将命令生成的paf.service文件部署到/usr/lib/systemd/system/下去。
[19:38:36-postgres@centos3:/var/lib/pgsql/14/monitor]$ sudo mv paf.service /usr/lib/systemd/system/
[19:40:01-postgres@centos3:/var/lib/pgsql/14/monitor]$ sudo systemctl daemon-reload
sudo systemctl start paf
sudo systemctl status paf
[19:43:26-postgres@centos3:/var/lib/pgsql/14/monitor]$ pg_autoctl show state --pgdata ./
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
-------+-------+--------------+----------------+--------------+---------------------+--------------------
node_1 | 1 | centos1:5678 | 5: 0/401D7E8 | read-write | primary | primary
node_2 | 2 | centos2:5678 | 5: 0/401D7E8 | read-only | secondary | secondary
主从节点:
步骤基本雷同:
19:30:04-postgres@centos2:/var/lib/pgsql]$ pg_autoctl -q show systemd --pgdata /var/lib/pgsql/14/autofailover/ > pafnode.service
[19:45:40-postgres@centos2:/var/lib/pgsql]$ cat pafnode.service
[Unit]
Description = pg_auto_failover
[Service]
WorkingDirectory = /var/lib/pgsql
Environment = 'PGDATA=/var/lib/pgsql/14/autofailover'
User = postgres
ExecStart = /usr/pgsql-14/bin/pg_autoctl run
Restart = always
StartLimitBurst = 0
ExecReload = /usr/pgsql-14/bin/pg_autoctl reload
[Install]
WantedBy = multi-user.target
后边不再缀述。
总结:
有关HA的开源架构方案确实有少,在了解的基础上,找自己最熟悉,同时又能满足实际需求的就是最好的。另外,可以看看该架构对应的开源库的关注程度(访问及更新频率),都能得到一个比较。
参考:
1、倾情打造PostgreSQL高可用系列(一):pg_auto_failover搭建 - 墨天轮 (modb.pro):https://www.modb.pro/db/48135
2、Installing pg_auto_failover: https://pg-auto-failover.readthedocs.io/en/main/install.html





