Whoami:5年+金融、政府、医疗领域工作经验的DBACertificate:OCP、PCPSkill:Oracle、Mysql、PostgreSQLPlatform:CSDN、墨天伦、公众号(呆呆的私房菜)
阅读本文可以了解基于Patroni + etcd + 流复制的方式来实现PostgreSQL数据库高可用的相关内容,包含Patroni高可用架构实现的基本原理、搭建部署及主备切换等。
Patroni 是一个用于自动化 PostgreSQL 数据库的设置、维护和高可用性管理的开源工具。
它通常与分布式键值存储系统(如 etcd、Consul、ZooKeeper 或 Kubernetes)一起使用,以确保数据库在主节点发生故障时能够持续提供服务。
Patroni 通过实现流复制、故障转移、配置变更和节点监控等功能,来增强 PostgreSQL 的高可用性。

Patroni的基本实现原理是利用分布式配置存储(DCS)来协调和管理PostgreSQL集群中的多个节点,确保集群的高可用性和故障转移。它通过监控每个节点的健康状况,自动进行故障转移,以及在主节点故障时选举新的主节点,从而实现数据库的持续可用性。
分布式配置存储(DCS)集成:Patroni与etcd、Consul、ZooKeeper或Kubernetes等DCS集成,使用它们来存储集群状态和配置信息,确保集群中的所有节点都能访问到最新的集群信息; 健康检查:Patroni定期对每个PostgreSQL实例进行健康检查,包括检查数据库是否在运行、是否能够连接、复制延迟等,以确保它们能够正常工作;
故障转移:当主节点出现故障时,Patroni会自动触发故障转移过程,通过DCS中的共识算法在剩余的从节点中选举出新的主节点,并更新DCS中的相关信息;
配置管理:Patroni允许管理员通过DCS更新集群配置,这些更改会自动同步到所有节点,确保集群配置的一致性;
服务发现:应用程序可以通过查询DCS来发现当前的主节点,从而实现对数据库的读写操作,而无需直接与Patroni交互;
自动故障恢复:Patroni能够在故障节点恢复后自动将其重新加入集群,根据配置决定其作为新的主节点或从节点;
备份和恢复支持:Patroni支持与备份工具(如pgBackRest、WAL-E等)集成,提供集群备份和恢复的功能;
REST API:Patroni提供了REST API,允许用户通过patronictl工具或其他客户端程序进行集群管理和监控;
防止脑裂:Patroni通过与Linux看门狗(watchdog)集成,确保在出现网络分区或其他问题时,不会有两个节点同时认为自己是主节点,从而避免脑裂问题;
级联复制和同步复制:Patroni支持级联复制,允许从节点从另一个从节点而不是主节点复制数据,以及同步复制,确保数据在主节点和至少一个从节点上都写入后才认为是成功。
高可用搭建环境IP规划:
| 系统类型 | 类型 | 主机名 | IP地址 |
| 在线库集群 | 主库 | pg01 | 192.168.56.11 |
| 从库 | pg02 | 192.168.56.22 | |
| 从库 | pg03 | 192.168.56.33 | |
| VIP | 192.168.56.66 |
高可用集群软件版本规划:
| 软件类型 | 软件版本 | 用途 |
| RHEL | 7.6 | 操作系统 |
| Patroni | 1.6 | 集群自动切换管理软件 |
| Etcd | 3.3.11 | 集群仲裁软件 |
| PostgreSQL | 13.4 | 数据库软件 |
1. PostgreSQL主备部署
# 如无特殊说明,下列命令所有节点都执行1. 主机名配置su - rootcat >> etc/hosts <<EOF192.168.56.11 pg01192.168.56.22 pg02192.168.56.33 pg03EOF2. 操作系统配置su - rootuseradd postgresecho "postgres#2024" | passwd --stdin postgrescat >> /etc/sysctl.conf <<EOFkernel.shmmax = 4294967296kernel.shmall = 2097152kernel.shmmni = 4096kernel.sem = 250 32000 100 128fs.aio-max-nr = 1048576net.ipv4.ip_local_port_range = 9000 65500net.core.rmem_default = 262144net.core.rmem_max = 4194304net.core.wmem_default = 262144net.core.wmem_max = 4194304fs.file-max = 6815744EOFcat >> etc/security/limits.conf <<EOFpostgres soft nproc unlimitedpostgres hard nproc unlimitedpostgres soft nofile 100000postgres hard nofile 100000postgres soft stack unlimitedpostgres hard stack unlimitedpostgres soft core unlimitedpostgres hard core unlimitedpostgres soft memlock unlimitedpostgres hard memlock unlimitedEOF3. 主备互信创建# 数据库wal需要使用scp命令集中归档至同一存储节点,因此需要对3个数据库节点的postgres用户之间配置互信;# 本次wal归档节点使用PostgreSQL主节点,即wal归档至 192.168.56.11 的/data/pgwal/archive_wals目录下。su - postgresssh-keygen -t rsassh-copy-id pg01ssh-copy-id pg02ssh-copy-id pg034. 创建数据目录su - rootmkdir -p pg/{pghome,patroni}mkdir -p data/{pgwal,pgdata,pgbak}mkdir -p data/pgwal/archive_walsmkdir -p data/pgdata/pg13chown -R postgres:postgres pgchown -R postgres:postgres data/{pgwal,pgdata,pgbak}chmod -R 700 pgchmod -R 700 data/{pgwal,pgdata,pgbak}5. 配置环境变量su - postgrescat >> /home/postgres/.bash_profileexport PGHOME=/pg/pghomeexport PGDATA=/data/pgdata/pg13export PGPORT=5432export PGDATABASE=postgresexport LD_LIBRARY_PATH=\$PGHOME/lib:\$LD_LIBRARY_PATHexport PATH=\$PGHOME/bin:\$PATHEOFsource .bashrc6. 安装依赖包yum -y install wget gcc gcc-c++ epel-release llvm5.0 llvm5.0-devel clang libicu-devel perl-ExtUtils-Embed readline readline-devel zlib zlib-devel openssl openssl-devel pam-devel libxml2-devel libxslt-devel openldap-devel systemd-devel tcl-devel python-devel7. 源码安装数据库# 上传PostgreSQL13.4源码安装包至/data目录,使用postgres用户解压并编译安装su - postgrescd datatar zxvf postgresql-13.4.tar.gzcd postgresql-13.4./configure --prefix=/pg/pghome --with-pgport=5432./configure --prefix=/pg/pghome --with-pgport=5432gmake worldgmake installcd contribmake && make instal8. 主库初始化并修改配置文件(主库执行)su - postgresinitdb -D data/pgdata/pg13 -E UTF8 --locale=zh_CN.UTF-8cat >> $PGDATA/pg_hba.conf <<EOFecho "host all all 0.0.0.0/0 md5"echo "host replication replicator 192.168.56.11/24 md5"echo "host replication replicator 192.168.56.22/24 md5"echo "host replication replicator 192.168.56.33/24 md5"EOFcat >> $PGDATA/postgresql.conf <<EOFarchive_command = 'scp %p 192.168.56.11:/data/pgwal/archive_wals/%f'archive_mode = 'on'archive_timeout = '1800s'cluster_name = 'PGCluster'hot_standby = 'on'listen_addresses = '0.0.0.0'max_connections = '2000'max_locks_per_transaction = '64'max_prepared_transactions = '0'max_replication_slots = '10'max_wal_senders = '50'max_worker_processes = '8'port = '5432'track_commit_timestamp = 'off'wal_keep_size = 16GBwal_level = 'replica'wal_log_hints = 'on'hba_file = '/data/pgdata/pg13/pg_hba.conf'ident_file = '/data/pgdata/pg13/pg_ident.conf'restore_command = 'scp 192.168.56.11:/data/pgwal/archive_wals/%f %p'recovery_target_timeline = 'latest'EOFpg_ctl startpsql -U postgres -c "ALTER USER postgres WITH PASSWORD 'postgres#2024';"psql -U postgres -c "create user replicator replication login encrypted password 'replicator#2024';"9. 创建备库(只需要在备库执行)su - postgrespg_basebackup -D $PGDATA -Fp -Xs -v -P -h 192.168.56.11 -p 5432 -U replicatorcat >> $PGDATA/standby.signal <<EOFstandby_mode = 'on'EOFcat >> $PGDATA/postgresql.auto.conf <<EOFprimary_conninfo='application_name=pg01 host=192.168.56.11 port=5432 user=replicator password=replicator'max_standby_streaming_delay = 30swal_receiver_status_interval = 10shot_standby_feedback = onrecovery_target_timeline = 'latest'EOFpg_ctl start10. 检查主备状态主库查看同步状态select * from pg_stat_replication;备库查看同步状态select * from pg_stat_wal_receiver;
2. Patroni安装部署
# 如无特殊说明,下列命令所有节点都执行1. 依赖安装,注意python版本要求2.7以上yum install -y haproxy postgresql-devel* watchdog libnfnetlink python-devel libevent-devel libdbi* openssl-devel bzip2-devel *bsddb* sqlite-devel gdbm-devel2. pip安装wget https://mirrors.aliyun.com/pypi/packages/ce/ea/9b445176a65ae4ba22dce1d93e4b5fe182f953df71a145f557cffaffc1bf/pip-19.3.1.tar.gztar -xf pip-19.3.1.tar.gzcd pip-19.3.1python setup.py install3. setuptools安装wget https://mirrors.aliyun.com/pypi/packages/54/28/c45d8b54c1339f9644b87663945e54a8503cfef59cf0f65b3ff5dd17cf64/setuptools-44.0.2-py2.py3-none-any.whlpip install setuptools-44.0.2-py2.py3-none-any.whl4. 下载所有patroni用到的pip包wget http://mirrors.163.com/pypi/packages/97/2a/b854019bcb9b925cd10ff245dbc9448a82fe7fdb40127e5cf1733ad0765c/psycopg2_binary-2.8.4-cp27-cp27mu-manylinux1_x86_64.whlwget https://mirrors.163.com/pypi/packages/b4/40/a9837291310ee1ccc242ceb6ebfd9eb21539649f193a7c8c86ba15b98539/urllib3-1.25.7-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/ef/99/53bd1ac9349262f59c1c421d8fcc2559ae8a5eeffed9202684756b648d33/tzlocal-2.0.0-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/3f/d0/59bc5f1c6c4d4b498c41d8ce7052ee9e9d68be19e16038a55252018a6c4d/python_consul-1.1.0-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/e7/f9/f0b53f88060247251bf481fa6ea62cd0d25bf1b11a87888e53ce5b7c8ad2/pytz-2019.3-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/ec/d3/3aa0e7213ef72b8585747aa0e271a9523e713813b9a20177ebe1e939deb0/dnspython-1.16.0-py2.py3-none-any.whlpip install psycopg2_binary-2.8.4-cp27-cp27mu-manylinux1_x86_64.whlpip install urllib3-1.25.7-py2.py3-none-any.whlpip install six-1.13.0-py2.py3-none-any.whlpip install Click-7.0-py2.py3-none-any.whlpip install pytz-2019.3-py2.py3-none-any.whlpip install tzlocal-2.0.0-py2.py3-none-any.whlpip install python_dateutil-2.8.1-py2.py3-none-any.whlpip install dnspython-1.16.0-py2.py3-none-any.whlpip install urllib3-1.26.6-py2.py3-none-any.whlpip install idna-2.10-py2.py3-none-any.whlpip install chardet-4.0.0-py2.py3-none-any.whlpip install certifi-2021.5.30-py2.py3-none-any.whlpip install requests-2.26.0-py2.py3-none-any.whlpip install python_consul-1.1.0-py2.py3-none-any.whlpip install backports.functools_lru_cache-1.6.4-py2.py3-none-any.whlpip install wcwidth-0.2.5-py2.py3-none-any.whlpip install prettytable-1.0.1-py2.py3-none-any.whlwget http://mirrors.163.com/pypi/packages/32/80/8c0fbf433eb9e09eb025bb4f942881cfcf2e343db0750fd50873b4722b43/patroni-1.6.1.tar.gzwget http://mirrors.163.com/pypi/packages/73/93/4f8213fbe66fc20cb904f35e6e04e20b47b85bee39845cc66a0bcf5ccdcb/psutil-5.6.7.tar.gzwget http://mirrors.163.com/pypi/packages/69/6c/301876940e760a8b46c1caacf08c298f511f517c70eec32e43f38e9cc6f5/cdiff-1.0.tar.gzwget http://mirrors.163.com/pypi/packages/a1/da/616a4d073642da5dd432e5289b7c1cb0963cc5dde23d1ecb8d726821ab41/python-etcd-0.4.5.tar.gzpip install psutil-5.6.7.tar.gzpip install cdiff-1.0.tar.gzpip install python-etcd-0.4.5.tar.gzpip install patroni-1.6.1.tar.gz5. 检查python配置文件如果使用python2安装PG13以上版本检查下面文件是否含有wal_keep_segments 参数,如果有需要注释掉。vi /lib/python2.7/site-packages/patroni/postgresql/config.py

3. Patroni配置
# 编辑patroni配置文件pg.yml,写入etcd、PostgreSQL相关信息。# 3个数据库节点的patroni配置文件略有不同,需要注意修改。1. 主节点配置如下:cat > /pg/patroni/pg.yml <<EOFscope: PGClusternamespace: /pgsql/name: db1restapi:listen: 192.168.56.11:8008connect_address: 192.168.56.11:8008etcd:hosts: 192.168.56.11:2379,192.168.56.22:2379,192.168.56.33:2379bootstrap:# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster# and all other cluster members will use it as a global configurationdcs:ttl: 30loop_wait: 10retry_timeout: 10maximum_lag_on_failover: 1048576master_start_timeout: 300synchronous_mode: truesynchronous_standby_names: "ANY 1 (pg01,pg02,pg03)"postgresql:use_pg_rewind: trueuse_slots: trueparameters:listen_addresses: "*"port: 5432wal_level: hot_standbyhot_standby: "on"max_connections: 2000wal_keep_size: 16GBmax_wal_senders: 50max_replication_slots: 10wal_log_hints: "on"archive_mode: "on"archive_timeout: 1800sarchive_command: scp %p 192.168.56.11:/data/pgwal/archive_wals/%frecovery_conf:restore_command: scp 192.168.56.11:/data/pgwal/archive_wals/%f %ppostgresql:callbacks:on_start: /pg/patroni/patroni_callback.shon_stop: /pg/patroni/patroni_callback.shon_role_change: /pg/patroni/patroni_callback.shlisten: 0.0.0.0:5432connect_address: 192.168.56.11:5432data_dir: /data/pgdata/pg13bin_dir: /pg/pghome/bin# config_dir: /etc/postgresql/9.6/mainauthentication:replication:username: replicatorpassword: Postgres%cloversuperuser:username: postgrespassword: Postgres%cloverwatchdog:mode: automatic # Allowed values: off, automatic, requireddevice: /dev/watchdogsafety_margin: 5tags:nofailover: falsenoloadbalance: falseclonefrom: falsenosync: false2. 从节点1配置如下:cat > /pg/patroni/pg.yml <<EOFscope: PGClusternamespace: /pgsql/name: db2restapi:listen: 192.168.56.22:8008connect_address: 192.168.56.22:8008etcd:hosts: 192.168.56.11:2379,192.168.56.22:2379,192.168.56.33:2379bootstrap:# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster# and all other cluster members will use it as a global configurationdcs:ttl: 30loop_wait: 10retry_timeout: 10maximum_lag_on_failover: 1048576master_start_timeout: 300synchronous_mode: truesynchronous_standby_names: "ANY 1 (pg01,pg02,pg03)"postgresql:use_pg_rewind: trueuse_slots: trueparameters:listen_addresses: "*"port: 5432wal_level: hot_standbyhot_standby: "on"max_connections: 2000wal_keep_size: 16GBmax_wal_senders: 50max_replication_slots: 10wal_log_hints: "on"archive_mode: "on"archive_timeout: 1800sarchive_command: scp %p 192.168.56.22:/data/pgwal/archive_wals/%frecovery_conf:restore_command: scp 192.168.56.22:/data/pgwal/archive_wals/%f %ppostgresql:callbacks:on_start: /bin/sh /pg/patroni/patroni_callback.shon_stop: /bin/sh /pg/patroni/patroni_callback.shon_role_change: /bin/sh /pg/patroni/patroni_callback.shlisten: 0.0.0.0:5432connect_address: 192.168.56.22:5432data_dir: /data/pgdata/pg13bin_dir: /pg/pghome/bin# config_dir: /etc/postgresql/9.6/mainauthentication:replication:username: replicatorpassword: Postgres%cloversuperuser:username: postgrespassword: Postgres%cloverwatchdog:mode: automatic # Allowed values: off, automatic, requireddevice: /dev/watchdogsafety_margin: 5tags:nofailover: falsenoloadbalance: falseclonefrom: falsenosync: falseEOF3. 从节点2配置如下:cat > /pg/patroni/pg.yml <<EOFscope: PGClusternamespace: /pgsql/name: db3restapi:listen: 192.168.56.33:8008connect_address: 192.168.56.33:8008etcd:hosts: 192.168.56.11:2379,192.168.56.22:2379,192.168.56.33:2379bootstrap:# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster# and all other cluster members will use it as a global configurationdcs:ttl: 30loop_wait: 10retry_timeout: 10maximum_lag_on_failover: 1048576master_start_timeout: 300synchronous_mode: truesynchronous_standby_names: "ANY 1 (pg01,pg02,pg03)"postgresql:use_pg_rewind: trueuse_slots: trueparameters:listen_addresses: "*"port: 5432wal_level: hot_standbyhot_standby: "on"max_connections: 2000wal_keep_size: 16GBmax_wal_senders: 50max_replication_slots: 10wal_log_hints: "on"archive_mode: "on"archive_timeout: 1800sarchive_command: scp %p 192.168.56.33:/data/pgwal/archive_wals/%frecovery_conf:restore_command: scp 192.168.56.33:/data/pgwal/archive_wals/%f %ppostgresql:callbacks:on_start: /pg/patroni/patroni_callback.shon_stop: /pg/patroni/patroni_callback.shon_role_change: /pg/patroni/patroni_callback.shlisten: 0.0.0.0:5432connect_address: 192.168.56.33:5432data_dir: /data/pgdata/pg13bin_dir: /pg/pghome/bin# config_dir: /etc/postgresql/9.6/mainauthentication:replication:username: replicatorpassword: Postgres%cloversuperuser:username: postgrespassword: Postgres%cloverwatchdog:mode: automatic # Allowed values: off, automatic, requireddevice: /dev/watchdogsafety_margin: 5tags:nofailover: falsenoloadbalance: falseclonefrom: falsenosync: falseEOF
4. vip配置
PostgreSQL高可用环境下,主节点可以是3个数据库节点中的任意一个。因此,如果应用通过物理IP访问主库,每当主备切换时,都需要人工调整应用访问数据库的连接串,这显然很不友好。因此,需要引入VIP,即主库虚拟IP,以此访问主库。1. 添加postgres用户为sudoer,以root用户执行:su - rootvisudopostgres ALL=(ALL) NOPASSWD:ALL2. 配置脚本cat > /pg/patroni/patroni_callback.sh <<EOF#!/bin/bashreadonly cb_name=$1readonly role=$2readonly scope=$3VIP=192.168.56.66VIPBRD=192.168.56.255VIPNETMASK=255.255.255.0VIPNETMASKBIT=22#VIPifconfigVIPDEV=em1VIPLABEL=1function usage() {echo "Usage: $0 <on_start|on_stop|on_role_change> <role> <scope>";exit 1;}function addvip(){echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: /sbin/ip addr add ${VIP}/${VIPNETMASKBIT} brd ${VIPBRD} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}"sudo /sbin/ip addr add ${VIP}/${VIPNETMASKBIT} brd ${VIPBRD} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}sudo /usr/sbin/arping -q -A -c 1 -I ${VIPDEV}:${VIPLABEL} ${VIP}#sudo /sbin/iptables -F}function delvip(){echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: sudo /sbin/ip addr del ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}"sudo /sbin/ip addr del ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}sudo /usr/sbin/arping -q -A -c 1 -I ${VIPDEV} ${VIP}#sudo /sbin/iptables -F}echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` WARNING: patroni callback $cb_name $role $scope"case $cb_name inon_stop)delvip;;on_start);;on_role_change)if [[ $role == 'master' ]]; thenaddvipelif [[ $role == 'slave' ]]||[[ $role == 'replica' ]]||[[ $role == 'logical' ]]; thendelvipfi;;*)usage;;esacEOFchmod +x /pg/patroni/patroni_callback.sh
5. 启动Patroni
# 启动前提:# PostgreSQL流复制正常,且etcd也正常运行的情况下1. 手工启动3个数据库节点的patroni程序su – postgresnohup patroni /pg/patroni/pg.yml & >> /pg/patroni/patroni.log 2>&1 &2. 配置开机启动su – postgrescrontab -e@reboot patroni /pg/patroni/pg.yml & >> /pg/patroni/patroni.log 2>&1
1. 查看集群状态su - postgrespatronictl -c /pg/patroni/pg.yml list2. 主备切换patronictl -c /pg/patroni/pg.yml switchover3. 故障切换模拟节点故障,关掉主节点,同步备库将自动升级为主库;原主节点恢复后,将自动降级为备库。重启其他节点同理。rebootpatronictl -c /pg/patroni/pg.yml list
本文内容就到这啦,阅读完本篇,相信你对PostgreSQL高可用架构patroni相关知识有了一定的认识了吧!我们下篇再见!





