暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

通过patroni+etcd+vip 构建PostgreSQL高可用

原创 金同学 2025-06-03
646

Patroni 是专为 PostgreSQL 设计的高可用性管理框架,基于 Python 开发,通过集成分布式配置存储(DCS)如 etcd、ZooKeeper 或 Consul 实现集群自动化管理。

  • 分布式选举机制:采用 DCS 的原子操作实现主节点选举:当主库故障时,多个 Patroni 实例通过竞争更新 etcd 的 leader key完成选举;
  • 流复制高级特性:除支持同步/异步复制外,还提供基于 WAL 位置的延迟故障切换和级联复制拓扑管理;
  • 全生命周期自动化:支持故障自动切换(VIP自动漂移)和备库异常关闭后自动重启,另外可以通过patronictl 重建故障节点以及计划内的主备切换等。

Etcd 作为基于 Raft 协议的分布式键值存储,负责记录所有 PostgreSQL 节点的角色(Primary/Standby)、连接信息及健康状态等。每个Patroni都能读写ETCD的KEY,从而获取其他主机的PG数据库实例信息。另外在分布式选举中,通过监听 key 变更事件驱动状态机转换,节点故障时 TTL 过期触发选举流程。
image.png
增强型容灾设计

  • Watchdog 防脑裂​,通过看门狗(如 /dev/watchdog)监测 Patroni 进程状态,异常时强制重启节点;
  • 多层检测,结合 TCP 端口检查与 REST API(默认 8008 端口)实现应用层健康状态验证。

部署环境

主机规划

IP地址 节点名称 安装组件
192.168.1.101 pg_patroni01 etcd、postgres、patroni
192.168.1.102 pg_patroni02 etcd、postgres、patroni
192.168.1.103 pg_patroni03 etcd、postgres、patroni
192.168.1.111 VIP

端口规划

组件名称 版本 端口 描述
etcd(必选) 3.5.0 2380、2379 服务发现、集群监控、集群选主
postgres(必选) 15.13 5432 postgres实例
patroni(必选) 4.0.5 8008 集群软件,实现数据库切换

一、系统配置与内核优化

关闭RemoveIPC参数

某些Linux中,该参数会造成程序信号丢失等问题。

vi /etc/systemd/logind.conf RemoveIPC=no #修改为no #添加到自启服务 echo 'RemoveIPC=no' >> /usr/lib/systemd/system/systemd-logind.service systemctl daemon-reload systemctl stop systemd-logind systemctl start systemd-logind

修改资源限制

vi /etc/security/limits.conf #postgres新增下面参数---------- postgres soft nproc unlimited postgres hard nproc unlimited postgres soft stack unlimited postgres hard stack unlimited postgres soft core unlimited postgres hard core unlimited postgres soft memlock unlimited postgres hard memlock unlimited postgres soft nofile 102400 postgres hard nofile 102400

内核参数优化

######### 新增下面部分 #kernel kernel.sem= 4096 2147483647 2147483646 512000 kernel.msgmni = 2048 #kernel.shmmax建议内存大小,单位字节,即乘了3个1024 kernel.shmmax = 17179869184 #for 16g, 16*1024*1024*1024 #kernel.shmmax/page_size, page size一般为4096 kernel.shmall = 4194304 #for 16g, 17179869184/4096 vm.overcommit_memory= 1 vm.swappiness=1

生效

sysctl -p

加载watchdog

modprobe softdog chown postgres:postgres /dev/watchdog chmod 666 /dev/watchdog

修改/etc/sudoers,启用sudo功能

vi /etc/sudoers # 允许“postgres”用户组的成员运行所有命令,且运行时不需要输入密码 postgres ALL=(ALL) NOPASSWD:ALL

二、实例初始化

创建目录、用户

# 创建用户组 # userdel -r postgres groupadd -g 3001 postgres useradd -u 3001 -g postgres postgres # 创建目录 mkdir -p /data/{pgsql,soft} mkdir -p /data/pgsql/{data,archive,install} mkdir -p /home/postgres/app #赋权 chown -R postgres.postgres /data chown -R postgres.postgres /home/postgres

安装依赖包

# 安装依赖包 yum -y install readline-devel yum -y install gcc yum -y install zlib yum -y install openssl openssl-devel yum -y install pam pam-devel yum -y install libxml2 libxml2-devel yum -y install libxslt libxslt-devel yum -y install gcc bison gcc-c++ readline readline-devel zlib zlib-devel yum -y install libssl yum -y install systemd* yum -y install e2fsprogs-devel uuid uuid-devel libuuid-devel yum -y install libffi-devel yum -y install watchdog yum -y install psmisc yum -y install libicu-devel yum -y install flex yum -y install python3-pip # arm下安装pip3 yum -y install fprintd-pam yum -y install python3-devel

数据库编译

数据库安装,均在 postgres用户下操作。
1、配置环境变量

vi .bash_profile #新增一下部分for pg export PGPORT=5432 export PGUSER=postgres export PGHOST=127.0.0.1 export LANG=en_US.utf8 export PGHOME=/data/pgsql/install export PGDATA=/data/pgsql/data export MANPATH=$PGHOME/share/man:$MANPATH export LD_LIBRARY_PATH=$PGHOME/lib export PATH=$PGHOME/bin:$PATH export PGPASSFILE=/home/postgres/.pgpass PATH=$PATH:$HOME/.local/bin:$PGHOME/bin export PATH

2、编译安装

# 解压源码包 tar -zxvf /data/soft/postgresql-*.tar.gz -C /data/pgsql/install # 预编译 cd /data/pgsql/install/postgresql-* ./configure --prefix=/data/pgsql/install --with-openssl --with-pam --with-systemd --with-libxml --without-ldap # 编译 gmake world gmake install-world #安装扩展包、文档等 # 查看版本 /data/pgsql/install/bin/postgres --version

3、初始化
初始化时使用–wal-segsize参数用来指定单个WAL文件的大小,建议64M。

/data/pgsql/install/bin/initdb --wal-segsize=64 -D /data/pgsql/data -E UTF8 --locale=en_US.utf8 -U postgres -W #enter: postgres ### 启动/关闭 /data/pgsql/install/bin/pg_ctl -D /data/pgsql/data -l logfile start /data/pgsql/install/bin/pg_ctl -D /data/pgsql/data -l logfile stop

4、修改数据库参数

# 编辑配置文件 vi /data/pgsql/data/postgresql.conf 略 # 启停服务 pg_ctl status|start|stop|restart # 登录 su - postgres >psql

三、部署流复制

主库配置

创建复制账号

# 复制用户需要LOGIN、REPLICATION权限 create role repl with LOGIN REPLICATION CONNECTION LIMIT 5 ENCRYPTED PASSWORD '123456';

配置免密登录

su - postgres vi ~/.pgpass #备库从主库同步WAL日志使用,主备倒换后,主库降备库,新备库使用,所以备库也配置 192.168.1.101:5432:replication:repl:123456 192.168.1.102:5432:replication:repl:123456 192.168.1.103:5432:replication:repl:123456 # 修改权限 chmod 600 ~/.pgpass

配置白名单

vi /data/pgsql/data/pg_hba.conf # TYPE DATABASE USER ADDRESS METHOD local all all trust #本地socket访问 host all all ::1/128 trust #本地IPV6访问 host all all 127.0.0.1/32 trust #本地IPV4访问 # 因为all不能匹配replication,所以单独配置 host replication all 0.0.0.0/0 trust # 允许所有来源,通过任意用户访问任意数据库 host all all 0.0.0.0/0 md5 #加载 pg_ctl reload

备库配置

配置免密登录

su - postgres vi ~/.pgpass #备库从主库同步WAL日志使用,主备倒换后,主库降备库,新备库使用,所以备库也配置 192.168.1.101:5432:replication:repl:123456 192.168.1.102:5432:replication:repl:123456 192.168.1.103:5432:replication:repl:123456 # 修改权限 chmod 600 ~/.pgpass

使用pg_basebackup重建2个备库(192.168.1.102,192.168.1.103)

# 关闭备库 pg_ctl stop -m fast # 清空数据 cd $PGDATA;rm -rf * cd /data/pgsql/archive;rm -rf * # 两台备库执行,会直接将主库数据备份到$PGDATA下(主库:192.168.1.101) pg_basebackup -h 192.168.1.101-p 5432 -U repl -D $PGDATA -Fp -P -Xs -R -v # 重启 pg_ctl start

主从同步检查(主库查看)

--主库查看wal日志发送状态 \x select usename,application_name,client_addr,sync_state from pg_stat_replication; select * from pg_stat_replication; --从库查看wal日志接收状态 select * from pg_stat_wal_receiver; --主库查询复制延迟 select pid, usename, client_addr, state, pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) write_delay, pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) flush_delay, pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) replay_delay from pg_stat_replication;

四、部署etcd

下面所有操作三个节点均配置

安装etcd

tar xvzf etcd-v3.5.0-linux-*.tar.gz -C /home/postgres/app/ ln -s /home/postgres/app/etcd-v3.5.0-linux-arm64/etcd /home/postgres/app/etcd ln -s /home/postgres/app/etcd-v3.5.0-linux-arm64/etcdctl /home/postgres/app/etcdctl chown -R postgres.postgres /home/postgres/app/

配置启动脚本

示例:主库创建 start_etcd.sh 启动脚本(另外2个节点也需要配置)

# cd /home/postgres/app/ #vi start_etcd.sh /home/postgres/app/etcd --name etcd_01 \ --initial-advertise-peer-urls http://192.168.1.101:2380 \ --listen-peer-urls http://192.168.1.101:2380 \ --listen-client-urls http://192.168.1.101:2379,http://127.0.0.1:2379 \ --advertise-client-urls http://192.168.1.101:2379 \ --initial-cluster-token etcd-cluster \ --initial-cluster etcd_01=http://192.168.1.101:2380,etcd_02=http://192.168.1.102:2380,etcd_03=http://192.168.1.103:2380 \ --initial-cluster-state new \ --enable-v2 #赋权 chmod +x /home/postgres/app/start_etcd.sh

配置 etcd 服务

vi /usr/lib/systemd/system/etcd.service

[Unit] Description=etcd After=network.target remote-fs.target nss-lookup.target [Service] Type=forking ExecStart=/bin/bash -c "/home/postgres/app/start_etcd.sh > /home/postgres/app/start_etcd.log 2>&1 &" [Install] WantedBy=multi-user.target

设置自启

chmod 644 /usr/lib/systemd/system/etcd.service systemctl daemon-reload systemctl enable etcd.service systemctl etcd start systemctl etcd status

查看etcd集群状态

/home/postgres/app/etcdctl member list -w table /home/postgres/app/etcdctl endpoint status --cluster -w table /home/postgres/app/etcdctl endpoint health --cluster -w table

五、部署patroni

在postgres用户下操作。

安装 patroni

# 两个依赖 pip3 install psycopg2-binary pip3 install psycopg[binary] #说明,4.0.3版本有bug,无法正常修改参数。 pip3 install 'patroni[etcd]==4.0.5' # 如果内网无python镜像站,则找外网的相同系统环境,然后打包离线安装: # 外网打包下载 pip3 download psycopg2-binary psycopg[binary] 'patroni[etcd]==4.0.5' -d . # 内网离线安装 pip3 install --no-index --find-links=. psycopg2-binary psycopg[binary] 'patroni[etcd]==4.0.5' #验证 patroni --version

patroni配置文件

创建 patroni 配置文件,三个节点均配置,写本机ip,下面为主库示例。

scope: pg_patroni namespace: /service/ name: pg_patroni01 restapi: # 监听本地8008端口,用于Patroni内部通信和健康检查 listen: 192.168.1.101:8008 # 其他节点访问本节点API的地址,需为真实IP connect_address: 192.168.1.101:8008 etcd: #Provide host to do the initial discovery of the cluster topology: host: 192.168.1.101:2379 bootstrap: # this section will be written into Etcd:/<namespace>/<scope>/config after initializing # new cluster # and all other cluster members will use it as a `global configuration` dcs: # 主节点每10秒更新一次etcd中的Leader锁,超时30秒触发故障转移。 ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: true # 默认的同步备节点为1 synchronous_node_count: 1 # 允许无同步备节点时主库继续运行 synchronous_mode_strict: false postgresql: # 允许备节点通过pg_rewind快速修复分歧 use_pg_rewind: true use_slots: true parameters: wal_level: logical max_connections: 2000 max_wal_senders: 32 max_replication_slots: 10 wal_log_hints: "on" archive_mode: "on" hot_standby: "on" archive_timeout: 1800s postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.1.101:5432 data_dir: /data/pgsql/data bin_dir: /data/pgsql/install/bin config_dir: /data/pgsql/data pgpass: /home/postgres/.pgpass authentication: replication: username: repl password: 123456 superuser: username: postgres password: postgres callbacks: on_start: /home/postgres/app/patroni_callback.sh on_stop: /home/postgres/app/patroni_callback.sh on_role_change: /home/postgres/app/patroni_callback.sh tags: nofailover: false noloadbalance: false clonefrom: false nosync: false watchdog: mode: automatic device: /dev/watchdog safety_margin: 5

配置 patroni 服务

vi /usr/lib/systemd/system/patroni.service

[Unit] Description=patroni - a high-availability PostgreSQL Documentation=https://patroni.readthedocs.io/en/latest/index.html After=syslog.target network.target etcd.target Wants=network-online.target [Service] Type=simple User=postgres Group=postgres PermissionsStartOnly=true RuntimeDirectory=postgresql RuntimeDirectoryMode=0750 ExecStart=/usr/local/bin/patroni /home/postgres/app/patroni.yml StandardOutput=file:/home/postgres/app/patroni.log StandardError=file:/home/postgres/app/patroni.log ExecReload=/bin/kill -s HUP $MAINPID KillMode=process RestartSec=30s TimeoutSec=0 [Install] WantedBy=multi-user.target

启服务

# 赋权 chown -R postgres:postgres /home/postgres/app/patroni.yml chmod 755 /home/postgres/app/patroni.yml chmod -R 755 /usr/bin/openssl chmod -R 755 /usr/lib/python3.7/site-packages chmod 644 /usr/lib/systemd/system/patroni.service chown -R postgres:postgres /data/pgsql/data chmod -R 750 /data/pgsql/data ##启动注册服务前,先验证配置文件有效性,root下 sudo -i -u postgres /usr/local/bin/patroni --validate-config /home/postgres/app/patroni.yml # 开机自启动 systemctl daemon-reload systemctl enable patroni.service # 使用patroni管理postgres服务,所以启动前先手动停止pg服务。 systemctl stop patroni systemctl start patroni systemctl status patroni

查看 patroni 集群状态

# 查看集群状态 patronictl -c /home/postgres/app/patroni.yml list [root@PGs]# patronictl -c /home/postgres/app/patroni.yml list + Cluster: pg_patroni ---------+--------------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +--------------+---------------+--------------+---------+----+-----------+ | pg_patroni01 | 192.168.1.101 | Leader | running | 1 | | | pg_patroni02 | 192.168.1.102 | Sync Standby | running | 1 | 0 | | pg_patroni03 | 192.168.1.103 | Replica | running | 1 | 0 | +--------------+---------------+--------------+---------+----+-----------+

六、配置VIP

挂载vip

#添加 sudo /usr/sbin/ip addr add 192.168.1.111/24 dev eth0 #卸载 sudo /usr/sbin/ip addr del 192.168.1.111/24 dev eth0

配置回调脚本 patroni_callback.sh

#!/bin/bash readonly cb_name=$1 readonly role=$2 readonly scope=$3 VIP="192.168.1.111" vip_add_command="sudo /usr/sbin/ip addr add" vip_del_command="sudo /usr/sbin/ip addr del" arping="sudo /usr/bin/arping" VIPNETMASKBIT=24 VIPDEV=eth0 function usage() { echo "Usage: $0 <on_start|on_stop|on_role_change> <role> <scope>"; exit 1; } function addvip(){ echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: ${vip_add_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV}" >> /home/postgres/app/switch_vip.log ${vip_add_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} ${arping} -q -A -c 1 -I ${VIPDEV} ${VIP} } function delvip(){ echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: ${vip_del_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} " >> /home/postgres/app/switch_vip.log ${vip_del_command} ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} ${arping} -q -A -c 1 -I ${VIPDEV} ${VIP} } echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` WARNING: patroni callback $cb_name $role $scope" case $cb_name in on_stop) delvip ;; on_start) ;; on_role_change) # 4.0.0版本以后,patroni 角色被写入DCS时标记为 'primary' 而不是 'master'。 if [[ $role == 'primary' ]]; then addvip elif [[ $role == 'slave' ]]||[[ $role == 'replica' ]]||[[ $role == 'logical' ]]; then delvip fi ;; *) usage ;; esac

脚本说明 :4.0.0版本以后,patroni 角色被写入DCS时标记为 ‘primary’ 而不是 ‘master’。所以关注一下变量role。

image.png

执行switchover,观察leader切换与vip漂移。手工切换时,Sync Standby角色才能成功切换为主库。


patronictl -c /home/postgres/app/patroni.yml switchover

#示例:
[postgres@pocdb-01 ~]$ patronictl -c /home/postgres/app/patroni.yml switchover
Current cluster topology
+ Cluster: pg_patroni (7507559757515153390) --+-----------+----+-----------+
| Member       | Host          | Role         | State     | TL | Lag in MB |
+--------------+---------------+--------------+-----------+----+-----------+
| pg_patroni01 | 191.168.1.101  | Sync Standby | streaming | 17 |         0 |
| pg_patroni02 | 191.168.1.102  | Leader       | running   | 17 |           |
| pg_patroni03 | 191.168.1.103  | Replica      | streaming | 17 |         0 |
+--------------+---------------+--------------+-----------+----+-----------+
Primary [pg_patroni02]: 
Candidate ['pg_patroni01', 'pg_patroni03'] []: pg_patroni01
When should the switchover take place (e.g. 2025-05-28T12:16 )  [now]: 
Are you sure you want to switchover cluster pg_patroni, demoting current leader pg_patroni02? [y/N]: y
2025-05-28 11:16:58.09797 Successfully switched over to "pg_patroni01"

patroni集群中,不同角色的意义

  • Leader(主节点):由etcd选举产生,可处理写请求,生成wal并同步到备节点;
  • Sync Standby(同步备节点):实时接收并同步应用leader的wal日志,确保事务提交需等待该节点确认,实现强一致性。Leader故障时,Sync Standby优先被提升为新Leader;
  • Replica(普通备节点):接收WAL日志但异步应用,可能存在短暂延迟(如用户集群中Lag in MB=0表示当前无延迟);

遇到的错误
pytroni 4.0.3存在缺陷,修改参数会报下面错误。

[postgres@pocdb data]$ patronictl -c /home/postgres/app/patroni.yml edit-config -s 'max_connections=2000' Traceback (most recent call last): File "/home/postgres/.local/bin/patronictl", line 8, in <module> sys.exit(ctl()) File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1161, in __call__ return self.main(*args, **kwargs) File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1082, in main rv = self.invoke(ctx) File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1697, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/postgres/.local/lib/python3.7/site-packages/click/core.py", line 788, in invoke return __callback(*args, **kwargs) File "/home/postgres/.local/lib/python3.7/site-packages/patroni/ctl.py", line 2122, in edit_config show_diff(before_editing, after_editing) File "/home/postgres/.local/lib/python3.7/site-packages/patroni/ctl.py", line 1917, in show_diff markup_to_pager(PatchStream(buf), opts) File "/home/postgres/.local/lib/python3.7/site-packages/ydiff.py", line 639, in markup_to_pager theme=opts.theme) AttributeError: type object 'opts' has no attribute 'theme'
最后修改时间:2025-09-11 17:05:23
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论