Patroni 是专为 PostgreSQL 设计的高可用性管理框架,基于 Python 开发,通过集成分布式配置存储(DCS)如 etcd、ZooKeeper 或 Consul 实现集群自动化管理。
- 分布式选举机制:采用 DCS 的原子操作实现主节点选举:当主库故障时,多个 Patroni 实例通过竞争更新 etcd 的 leader key完成选举;
- 流复制高级特性:除支持同步/异步复制外,还提供基于 WAL 位置的延迟故障切换和级联复制拓扑管理;
- 全生命周期自动化:支持故障自动切换(VIP自动漂移)和备库异常关闭后自动重启,另外可以通过patronictl 重建故障节点以及计划内的主备切换等。
Etcd 作为基于 Raft 协议的分布式键值存储,负责记录所有 PostgreSQL 节点的角色(Primary/Standby)、连接信息及健康状态等。每个Patroni都能读写ETCD的KEY,从而获取其他主机的PG数据库实例信息。另外在分布式选举中,通过监听 key 变更事件驱动状态机转换,节点故障时 TTL 过期触发选举流程。

增强型容灾设计
- Watchdog 防脑裂,通过看门狗(如 /dev/watchdog)监测 Patroni 进程状态,异常时强制重启节点;
- 多层检测,结合 TCP 端口检查与 REST API(默认 8008 端口)实现应用层健康状态验证。
部署环境
主机规划
| IP地址 | 节点名称 | 安装组件 |
|---|---|---|
| 192.168.1.101 | pg_patroni01 | etcd、postgres、patroni |
| 192.168.1.102 | pg_patroni02 | etcd、postgres、patroni |
| 192.168.1.103 | pg_patroni03 | etcd、postgres、patroni |
| 192.168.1.111 | VIP |
端口规划
| 组件名称 | 版本 | 端口 | 描述 |
|---|---|---|---|
| etcd(必选) | 3.5.0 | 2380、2379 | 服务发现、集群监控、集群选主 |
| postgres(必选) | 15.13 | 5432 | postgres实例 |
| patroni(必选) | 4.0.5 | 8008 | 集群软件,实现数据库切换 |
一、系统配置与内核优化
关闭RemoveIPC参数
某些Linux中,该参数会造成程序信号丢失等问题。
vi /etc/systemd/logind.conf
RemoveIPC=no #修改为no
#添加到自启服务
echo 'RemoveIPC=no' >> /usr/lib/systemd/system/systemd-logind.service
systemctl daemon-reload
systemctl stop systemd-logind
systemctl start systemd-logind
修改资源限制
vi /etc/security/limits.conf
#postgres新增下面参数----------
postgres soft nproc unlimited
postgres hard nproc unlimited
postgres soft stack unlimited
postgres hard stack unlimited
postgres soft core unlimited
postgres hard core unlimited
postgres soft memlock unlimited
postgres hard memlock unlimited
postgres soft nofile 102400
postgres hard nofile 102400
内核参数优化
######### 新增下面部分
#kernel
kernel.sem= 4096 2147483647 2147483646 512000
kernel.msgmni = 2048
#kernel.shmmax建议内存大小,单位字节,即乘了3个1024
kernel.shmmax = 17179869184 #for 16g, 16*1024*1024*1024
#kernel.shmmax/page_size, page size一般为4096
kernel.shmall = 4194304 #for 16g, 17179869184/4096
vm.overcommit_memory= 1
vm.swappiness=1
生效
sysctl -p
加载watchdog
modprobe softdog chown postgres:postgres /dev/watchdog chmod 666 /dev/watchdog
修改/etc/sudoers,启用sudo功能
vi /etc/sudoers
# 允许“postgres”用户组的成员运行所有命令,且运行时不需要输入密码
postgres ALL=(ALL) NOPASSWD:ALL
二、实例初始化
创建目录、用户
# 创建用户组
# userdel -r postgres
groupadd -g 3001 postgres
useradd -u 3001 -g postgres postgres
# 创建目录
mkdir -p /data/{pgsql,soft}
mkdir -p /data/pgsql/{data,archive,install}
mkdir -p /home/postgres/app
#赋权
chown -R postgres.postgres /data
chown -R postgres.postgres /home/postgres
安装依赖包
# 安装依赖包
yum -y install readline-devel
yum -y install gcc
yum -y install zlib
yum -y install openssl openssl-devel
yum -y install pam pam-devel
yum -y install libxml2 libxml2-devel
yum -y install libxslt libxslt-devel
yum -y install gcc bison gcc-c++ readline readline-devel zlib zlib-devel
yum -y install libssl
yum -y install systemd*
yum -y install e2fsprogs-devel uuid uuid-devel libuuid-devel
yum -y install libffi-devel
yum -y install watchdog
yum -y install psmisc
yum -y install libicu-devel
yum -y install flex
yum -y install python3-pip # arm下安装pip3
yum -y install fprintd-pam
yum -y install python3-devel
数据库编译
数据库安装,均在 postgres用户下操作。
1、配置环境变量
vi .bash_profile
#新增一下部分for pg
export PGPORT=5432
export PGUSER=postgres
export PGHOST=127.0.0.1
export LANG=en_US.utf8
export PGHOME=/data/pgsql/install
export PGDATA=/data/pgsql/data
export MANPATH=$PGHOME/share/man:$MANPATH
export LD_LIBRARY_PATH=$PGHOME/lib
export PATH=$PGHOME/bin:$PATH
export PGPASSFILE=/home/postgres/.pgpass
PATH=$PATH:$HOME/.local/bin:$PGHOME/bin
export PATH
2、编译安装
# 解压源码包
tar -zxvf /data/soft/postgresql-*.tar.gz -C /data/pgsql/install
# 预编译
cd /data/pgsql/install/postgresql-*
./configure --prefix=/data/pgsql/install --with-openssl --with-pam --with-systemd --with-libxml --without-ldap
# 编译
gmake world
gmake install-world #安装扩展包、文档等
# 查看版本
/data/pgsql/install/bin/postgres --version
3、初始化
初始化时使用–wal-segsize参数用来指定单个WAL文件的大小,建议64M。
/data/pgsql/install/bin/initdb --wal-segsize=64 -D /data/pgsql/data -E UTF8 --locale=en_US.utf8 -U postgres -W
#enter: postgres
### 启动/关闭
/data/pgsql/install/bin/pg_ctl -D /data/pgsql/data -l logfile start
/data/pgsql/install/bin/pg_ctl -D /data/pgsql/data -l logfile stop
4、修改数据库参数
# 编辑配置文件
vi /data/pgsql/data/postgresql.conf
略
# 启停服务
pg_ctl status|start|stop|restart
# 登录
su - postgres
>psql
三、部署流复制
主库配置
创建复制账号
# 复制用户需要LOGIN、REPLICATION权限
create role repl with LOGIN REPLICATION CONNECTION LIMIT 5 ENCRYPTED PASSWORD '123456';
配置免密登录
su - postgres
vi ~/.pgpass
#备库从主库同步WAL日志使用,主备倒换后,主库降备库,新备库使用,所以备库也配置
192.168.1.101:5432:replication:repl:123456
192.168.1.102:5432:replication:repl:123456
192.168.1.103:5432:replication:repl:123456
# 修改权限
chmod 600 ~/.pgpass
配置白名单
vi /data/pgsql/data/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
local all all trust #本地socket访问
host all all ::1/128 trust #本地IPV6访问
host all all 127.0.0.1/32 trust #本地IPV4访问
# 因为all不能匹配replication,所以单独配置
host replication all 0.0.0.0/0 trust
# 允许所有来源,通过任意用户访问任意数据库
host all all 0.0.0.0/0 md5
#加载
pg_ctl reload
备库配置
配置免密登录
su - postgres
vi ~/.pgpass
#备库从主库同步WAL日志使用,主备倒换后,主库降备库,新备库使用,所以备库也配置
192.168.1.101:5432:replication:repl:123456
192.168.1.102:5432:replication:repl:123456
192.168.1.103:5432:replication:repl:123456
# 修改权限
chmod 600 ~/.pgpass
使用pg_basebackup重建2个备库(192.168.1.102,192.168.1.103)
# 关闭备库
pg_ctl stop -m fast
# 清空数据
cd $PGDATA;rm -rf *
cd /data/pgsql/archive;rm -rf *
# 两台备库执行,会直接将主库数据备份到$PGDATA下(主库:192.168.1.101)
pg_basebackup -h 192.168.1.101-p 5432 -U repl -D $PGDATA -Fp -P -Xs -R -v
# 重启
pg_ctl start
主从同步检查(主库查看)
--主库查看wal日志发送状态
\x
select usename,application_name,client_addr,sync_state from pg_stat_replication;
select * from pg_stat_replication;
--从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
--主库查询复制延迟
select
pid,
usename,
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) write_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) flush_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) replay_delay
from pg_stat_replication;
四、部署etcd
下面所有操作三个节点均配置
安装etcd
tar xvzf etcd-v3.5.0-linux-*.tar.gz -C /home/postgres/app/ ln -s /home/postgres/app/etcd-v3.5.0-linux-arm64/etcd /home/postgres/app/etcd ln -s /home/postgres/app/etcd-v3.5.0-linux-arm64/etcdctl /home/postgres/app/etcdctl chown -R postgres.postgres /home/postgres/app/
配置启动脚本
示例:主库创建 start_etcd.sh 启动脚本(另外2个节点也需要配置)
#
cd /home/postgres/app/
#vi start_etcd.sh
/home/postgres/app/etcd --name etcd_01 \
--initial-advertise-peer-urls http://192.168.1.101:2380 \
--listen-peer-urls http://192.168.1.101:2380 \
--listen-client-urls http://192.168.1.101:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.101:2379 \
--initial-cluster-token etcd-cluster \
--initial-cluster etcd_01=http://192.168.1.101:2380,etcd_02=http://192.168.1.102:2380,etcd_03=http://192.168.1.103:2380 \
--initial-cluster-state new \
--enable-v2
#赋权
chmod +x /home/postgres/app/start_etcd.sh
配置 etcd 服务
vi /usr/lib/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
ExecStart=/bin/bash -c "/home/postgres/app/start_etcd.sh > /home/postgres/app/start_etcd.log 2>&1 &"
[Install]
WantedBy=multi-user.target
设置自启
chmod 644 /usr/lib/systemd/system/etcd.service
systemctl daemon-reload
systemctl enable etcd.service
systemctl etcd start
systemctl etcd status
查看etcd集群状态
/home/postgres/app/etcdctl member list -w table /home/postgres/app/etcdctl endpoint status --cluster -w table /home/postgres/app/etcdctl endpoint health --cluster -w table
五、部署patroni
在postgres用户下操作。
安装 patroni
# 两个依赖
pip3 install psycopg2-binary
pip3 install psycopg[binary]
#说明,4.0.3版本有bug,无法正常修改参数。
pip3 install 'patroni[etcd]==4.0.5'
# 如果内网无python镜像站,则找外网的相同系统环境,然后打包离线安装:
# 外网打包下载
pip3 download psycopg2-binary psycopg[binary] 'patroni[etcd]==4.0.5' -d .
# 内网离线安装
pip3 install --no-index --find-links=. psycopg2-binary psycopg[binary] 'patroni[etcd]==4.0.5'
#验证
patroni --version
patroni配置文件
创建 patroni 配置文件,三个节点均配置,写本机ip,下面为主库示例。
scope: pg_patroni
namespace: /service/
name: pg_patroni01
restapi:
# 监听本地8008端口,用于Patroni内部通信和健康检查
listen: 192.168.1.101:8008
# 其他节点访问本节点API的地址,需为真实IP
connect_address: 192.168.1.101:8008
etcd:
#Provide host to do the initial discovery of the cluster topology:
host: 192.168.1.101:2379
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing
# new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
# 主节点每10秒更新一次etcd中的Leader锁,超时30秒触发故障转移。
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: true
# 默认的同步备节点为1
synchronous_node_count: 1
# 允许无同步备节点时主库继续运行
synchronous_mode_strict: false
postgresql:
# 允许备节点通过pg_rewind快速修复分歧
use_pg_rewind: true
use_slots: true
parameters:
wal_level: logical
max_connections: 2000
max_wal_senders: 32
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "on"
hot_standby: "on"
archive_timeout: 1800s
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.1.101:5432
data_dir: /data/pgsql/data
bin_dir: /data/pgsql/install/bin
config_dir: /data/pgsql/data
pgpass: /home/postgres/.pgpass
authentication:
replication:
username: repl
password: 123456
superuser:
username: postgres
password: postgres
callbacks:
on_start: /home/postgres/app/patroni_callback.sh
on_stop: /home/postgres/app/patroni_callback.sh
on_role_change: /home/postgres/app/patroni_callback.sh
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
watchdog:
mode: automatic
device: /dev/watchdog
safety_margin: 5
配置 patroni 服务
vi /usr/lib/systemd/system/patroni.service
[Unit]
Description=patroni - a high-availability PostgreSQL
Documentation=https://patroni.readthedocs.io/en/latest/index.html
After=syslog.target network.target etcd.target
Wants=network-online.target
[Service]
Type=simple
User=postgres
Group=postgres
PermissionsStartOnly=true
RuntimeDirectory=postgresql
RuntimeDirectoryMode=0750
ExecStart=/usr/local/bin/patroni /home/postgres/app/patroni.yml
StandardOutput=file:/home/postgres/app/patroni.log
StandardError=file:/home/postgres/app/patroni.log
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
RestartSec=30s
TimeoutSec=0
[Install]
WantedBy=multi-user.target
启服务
# 赋权
chown -R postgres:postgres /home/postgres/app/patroni.yml
chmod 755 /home/postgres/app/patroni.yml
chmod -R 755 /usr/bin/openssl
chmod -R 755 /usr/lib/python3.7/site-packages
chmod 644 /usr/lib/systemd/system/patroni.service
chown -R postgres:postgres /data/pgsql/data
chmod -R 750 /data/pgsql/data
##启动注册服务前,先验证配置文件有效性,root下
sudo -i -u postgres /usr/local/bin/patroni --validate-config /home/postgres/app/patroni.yml
# 开机自启动
systemctl daemon-reload
systemctl enable patroni.service
# 使用patroni管理postgres服务,所以启动前先手动停止pg服务。
systemctl stop patroni
systemctl start patroni
systemctl status patroni
查看 patroni 集群状态
# 查看集群状态
patronictl -c /home/postgres/app/patroni.yml list
[root@PGs]# patronictl -c /home/postgres/app/patroni.yml list
+ Cluster: pg_patroni ---------+--------------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------------+---------------+--------------+---------+----+-----------+
| pg_patroni01 | 192.168.1.101 | Leader | running | 1 | |
| pg_patroni02 | 192.168.1.102 | Sync Standby | running | 1 | 0 |
| pg_patroni03 | 192.168.1.103 | Replica | running | 1 | 0 |
+--------------+---------------+--------------+---------+----+-----------+
六、配置VIP
挂载vip
#添加
sudo /usr/sbin/ip addr add 192.168.1.111/24 dev eth0
#卸载
sudo /usr/sbin/ip addr del 192.168.1.111/24 dev eth0
配置回调脚本 patroni_callback.sh
#!/bin/bash
readonly cb_name=$1
readonly role=$2
readonly scope=$3
VIP="192.168.1.111"
vip_add_command="sudo /usr/sbin/ip addr add"
vip_del_command="sudo /usr/sbin/ip addr del"
arping="sudo /usr/bin/arping"
VIPNETMASKBIT=24
VIPDEV=eth0
function usage() {
echo "Usage: $0 <on_start|on_stop|on_role_change> <role> <scope>";
exit 1;
}
function addvip(){
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: ${vip_add_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV}" >> /home/postgres/app/switch_vip.log
${vip_add_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV}
${arping} -q -A -c 1 -I ${VIPDEV} ${VIP}
}
function delvip(){
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: ${vip_del_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} " >> /home/postgres/app/switch_vip.log
${vip_del_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV}
${arping} -q -A -c 1 -I ${VIPDEV} ${VIP}
}
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` WARNING: patroni callback $cb_name $role $scope"
case $cb_name in
on_stop)
delvip
;;
on_start)
;;
on_role_change)
# 4.0.0版本以后,patroni 角色被写入DCS时标记为 'primary' 而不是 'master'。
if [[ $role == 'primary' ]]; then
addvip
elif [[ $role == 'slave' ]]||[[ $role == 'replica' ]]||[[ $role == 'logical' ]]; then
delvip
fi
;;
*)
usage
;;
esac
脚本说明 :4.0.0版本以后,patroni 角色被写入DCS时标记为 ‘primary’ 而不是 ‘master’。所以关注一下变量role。

执行switchover,观察leader切换与vip漂移。手工切换时,Sync Standby角色才能成功切换为主库。
patronictl -c /home/postgres/app/patroni.yml switchover
#示例:
[postgres@pocdb-01 ~]$ patronictl -c /home/postgres/app/patroni.yml switchover
Current cluster topology
+ Cluster: pg_patroni (7507559757515153390) --+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------------+---------------+--------------+-----------+----+-----------+
| pg_patroni01 | 191.168.1.101 | Sync Standby | streaming | 17 | 0 |
| pg_patroni02 | 191.168.1.102 | Leader | running | 17 | |
| pg_patroni03 | 191.168.1.103 | Replica | streaming | 17 | 0 |
+--------------+---------------+--------------+-----------+----+-----------+
Primary [pg_patroni02]:
Candidate ['pg_patroni01', 'pg_patroni03'] []: pg_patroni01
When should the switchover take place (e.g. 2025-05-28T12:16 ) [now]:
Are you sure you want to switchover cluster pg_patroni, demoting current leader pg_patroni02? [y/N]: y
2025-05-28 11:16:58.09797 Successfully switched over to "pg_patroni01"
patroni集群中,不同角色的意义
- Leader(主节点):由etcd选举产生,可处理写请求,生成wal并同步到备节点;
- Sync Standby(同步备节点):实时接收并同步应用leader的wal日志,确保事务提交需等待该节点确认,实现强一致性。Leader故障时,Sync Standby优先被提升为新Leader;
- Replica(普通备节点):接收WAL日志但异步应用,可能存在短暂延迟(如用户集群中Lag in MB=0表示当前无延迟);
遇到的错误
pytroni 4.0.3存在缺陷,修改参数会报下面错误。
[postgres@pocdb data]$ patronictl -c /home/postgres/app/patroni.yml edit-config -s 'max_connections=2000'
Traceback (most recent call last):
File "/home/postgres/.local/bin/patronictl", line 8, in <module>
sys.exit(ctl())
File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/home/postgres/.local/lib/python3.7/site-packages/patroni/ctl.py", line 2122, in edit_config
show_diff(before_editing, after_editing)
File "/home/postgres/.local/lib/python3.7/site-packages/patroni/ctl.py", line 1917, in show_diff
markup_to_pager(PatchStream(buf), opts)
File "/home/postgres/.local/lib/python3.7/site-packages/ydiff.py", line 639, in markup_to_pager
theme=opts.theme)
AttributeError: type object 'opts' has no attribute 'theme'




