Patroni

开源软件联盟PostgreSQL分会 2021-08-21

2134

作者：王志斌

前言

Patroni作为PG开箱即用的高可用解决方案，尤其在云端环境常作为首选，本文主要介绍如何部署Patroni集群，其特点归纳如下：

1.易用性：模板化部署

2.可用性：分布式一致性软件提供强一致性

3.健壮性：提供多个复制集群

4.便捷性：手工自动切换

5.自治性：自动故障转移

6.可靠性：watchdog机制

当然Patroni还可以和其他开源组件结合使用，如Pgbouncer（连接池）、Pgbackrest（远端备份）、Promethues+grafana+altermanager（监控告警）等等。

基本配置

图 1 Patroni集群架构示意图

如上图所示，主集群包括三台服务器，备用集群包括一台服务器，通过同步流复制，与主集群进行数据同步，下面将详细介绍环境部署及操作流程。

硬件环境要求

需要虚拟机四台，其中三台组件Leader cluster，另外一台建立standby cluster

集群名称	主机名	IP	安装软件	角色
MasterCluster	centos1	192.168.137.101	Etcd、patroni	Leader
MasterCluster	Centos2	192.168.137.103	Etcd、patroni	Follower
MasterCluster	Centos3	192.168.137.104	Etcd、patroni	Follower
Standbycluster	Centos4	192.168.137.105	Patroni	Leader

软件环境要求

需要安装如下软件：

操作系统版本：CentOS Linux release 7.7.1908 (Core)

python版本：2.7.5

系统依赖

yum -y install gcc etcd haproxy libyaml

yum -y install epel-release

yum -y install python-pip

yum -y install python-devel

软件依赖

pip install --upgrade pip

pip install psycopg2==2.5.4

pip install --upgrade setuptools

pip install -r requirements.txt

环境配置

配置环境变量

分别在三台集群上设置环境变量

export PATH=$PATH:/home/postgres/database/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/postgres/database/libexport ETCDCTL_API=3

配置Etcd服务

192.168.137.101

vim /etc/etcd/etcd.conf

ETCD_NAME=zwetcd_1ETCD_DATA_DIR="/var/lib/etcd/default.etcd"ETCD_LISTEN_PEER_URLS="http://192.168.137.101:2380"ETCD_LISTEN_CLIENT_URLS="http://192.168.137.101:2379,http://127.0.0.1:2379"#[cluster]ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.137.101:2380"ETCD_INITIAL_CLUSTER="zwetcd_1=http://192.168.137.101:2380,zwetcd_2=http://192.168.137.103:2380,zwetcd_3=http://192.168.137.104:2380"ETCD_INITIAL_CLUSTER_STATE="new"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.137.101:2379"

192.168.137.103

vim /etc/etcd/etcd.conf

ETCD_NAME=zwetcd_2ETCD_DATA_DIR="/var/lib/etcd/default.etcd"ETCD_LISTEN_PEER_URLS="http://192.168.137.103:2380"ETCD_LISTEN_CLIENT_URLS="http://192.168.137.103:2379,http://127.0.0.1:2379"#[cluster]ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.137.103:2380"ETCD_INITIAL_CLUSTER="zwetcd_1=http://192.168.137.101:2380,zwetcd_2=http://192.168.137.103:2380,zwetcd_3=http://192.168.137.104:2380"ETCD_INITIAL_CLUSTER_STATE="new"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.137.103:2379"

192.168.137.104

vim /etc/etcd/etcd.conf

ETCD_NAME=zwetcd_3ETCD_DATA_DIR="/var/lib/etcd/default.etcd"ETCD_LISTEN_PEER_URLS="http://192.168.137.104:2380"ETCD_LISTEN_CLIENT_URLS="http://192.168.137.104:2379,http://127.0.0.1:2379"#[cluster]ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.137.104:2380"ETCD_INITIAL_CLUSTER="zwetcd_1=http://192.168.137.101:2380,zwetcd_2=http://192.168.137.103:2380,zwetcd_3=http://192.168.137.104:2380"ETCD_INITIAL_CLUSTER_STATE="new"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.137.104:2379"

配置HAPROXY

vi /etc/haproxy/haproxy.cfg

#proxy web监控，查看统计信息listen status    bind 0.0.0.0:1080    mode http    log global    stats enable    # stats是haproxy的一个统计页面的套接字，该参数设置统计页面的刷新间隔为30s    stats refresh 30s    stats uri /haproxy-stats    # 设置统计页面认证时的提示内容    stats realm Private lands    # 设置统计页面认证的用户和密码，如果要设置多个，另起一行写入即可    stats auth admin:passw0rd    # 隐藏统计页面上的haproxy版本信息#    stats hide-version#---------------------------------------------------------------------listen master    bind *:5000        mode tcp        option tcplog        balance roundrobin    option httpchk OPTIONS /master    http-check expect status 200    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions        server node1 192.168.137.101:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2        server node2 192.168.137.103:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2        server node3 192.168.137.104:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2listen replicas    bind *:5001        mode tcp        option tcplog        balance roundrobin    option httpchk OPTIONS /replica    http-check expect status 200    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions        server node1 192.168.137.101:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2        server node2 192.168.137.103:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2        server node3 192.168.137.104:5432 maxconn 1000 check port 8008 inter 5000 rise 2 fall 2并设置selinux权限setsebool -P haproxy_connect_any=1

配置文件

postgres0.yml

scope: pgclustername: postgresql0restapi:  listen: 192.168.137.101:8008  connect_address: 192.168.137.101:8008etcd:  host: 192.168.137.101:2379bootstrap:  dcs:    ttl: 30    loop_wait: 10    retry_timeout: 10    maximum_lag_on_failover: 1048576    postgresql:      use_pg_rewind: true      parameters:  initdb:   - encoding: UTF8  - data-checksums  pg_hba:    - host replication replicator 192.168.137.0/24 md5  - host all all 0.0.0.0/0 md5  users:    admin:      password: admin      options:        - createrole        - createdbpostgresql:  listen: 192.168.137.101:5432  connect_address: 192.168.137.101:5432  data_dir: /home/postgres/database/data  pgpass: /tmp/pgpass  authentication:    replication:      username: replicator      password: rep-pass    superuser:      username: postgres      password: zalando    rewind:  # Has no effect on postgres 10 and lower      username: rewind_user      password: rewind_password  parameters:    unix_socket_directories: '.'    wal_level: hot_standbytags:    nofailover: false    noloadbalance: false    clonefrom: false    nosync: false

postgres1.yml

scope: pgclustername: postgresql1restapi:  listen: 192.168.137.103:8008  connect_address: 192.168.137.103:8008etcd:   host: 192.168.137.103:2379bootstrap:  dcs:    ttl: 30    loop_wait: 10    retry_timeout: 10    maximum_lag_on_failover: 1048576    postgresql:      use_pg_rewind: true      parameters:  initdb:   - encoding: UTF8  - data-checksums  pg_hba:    - host replication replicator 192.168.137.0/24 md5  - host all all 0.0.0.0/0 md5  users:    admin:      password: admin      options:        - createrole        - createdbpostgresql:  listen: 192.168.137.103:5432  connect_address: 192.168.137.103:5432  data_dir: /home/postgres/database/data  pgpass: /tmp/pgpass  authentication:    replication:      username: replicator      password: rep-pass    superuser:      username: postgres      password: zalando    rewind:  # Has no effect on postgres 10 and lower      username: rewind_user      password: rewind_password  parameters:    unix_socket_directories: '.'    wal_level: hot_standby  basebackup:      - verbose      - max-rate: 100Mtags:    nofailover: false    noloadbalance: false    clonefrom: false

postgres2.yml

scope: pgclustername: postgresql2restapi:  listen: 192.168.137.104:8008  connect_address: 192.168.137.104:8008etcd:  host: 192.168.137.104:2379bootstrap:  dcs:    ttl: 30    loop_wait: 10    retry_timeout: 10    maximum_lag_on_failover: 1048576    postgresql:      use_pg_rewind: true      parameters:  initdb:    - encoding: UTF8  - data-checksums  pg_hba:    - host replication replicator 192.168.137.0/24 md5  - host all all 0.0.0.0/0 md5  users:    admin:      password: admin      options:        - createrole        - createdbpostgresql:  listen: 192.168.137.104:5432  connect_address: 192.168.137.104:5432  data_dir: /home/postgres/database/data  pgpass: /tmp/pgpass  authentication:    replication:      username: replicator      password: rep-pass    superuser:      username: postgres      password: zalando    rewind:  # Has no effect on postgres 10 and lower      username: rewind_user      password: rewind_password  parameters:    unix_socket_directories: '.'    wal_level: hot_standby  basebackup:      - verbose      - max-rate: 100Mtags:    nofailover: false    noloadbalance: false    clonefrom: false

standby.yml

scope: standbynamespace: /service/name: pgsql10_node1restapi:    listen: 192.168.137.105:8008    connect_address: 192.168.137.105:8008etcd:    host: 192.168.137.101:2379bootstrap:    dcs:        standby_cluster:            host: 192.168.137.101            port: 5432            create_replica_methods:            - basebackup    initdb:    - encoding: UTF8    - data-checksums    pg_hba:    - host replication replicator 192.168.137.0/24 md5    - host all all 0.0.0.0/0 md5    users:        admin:            password: admin            options:                - createrole                - createdbpostgresql:    listen: 192.168.137.105:5432    connect_address: 192.168.137.105:5432    bin_dir: /home/postgres/database/bin    data_dir: /home/postgres/database/data    pgpass: /home/postgres/.pgpass    authentication:        replication:            username: replicator             password: rep-pass        superuser:            username: postgres            password: zalando    parameters:        wal_level: hot_standby        hot_standby: on       create_replica_methods:        - basebackup    basebackup:        max-rate: '100M'tags:    nofailover: false    noloadbalance: false    clonefrom: falsenosync: false

启动服务

关闭防火墙

Systemctl stop firewalldSystemctl disable firewalld

启动patroni

192.168.137.101

python patroni.py postgres0.yml

192.168.137.103

python patroni.py postgres1.yml

192.168.137.104

python patroni.py postgres2.yml

启动etcd服务

分别在三台集群，启动etcd服务。

systemctl start etcd

查看etcd状态

etcdctl --write-out=table --endpoints=192.168.137.101:2379,192.168.137.103:2379,192.168.137.104:2379 endpoint status

也可以执行etcdctl --endpoints=http://192.168.137.101:2379 member list -w table

查看patroni状态

在任意一台机器，执行如下命令：

例如在192.168.137.101上执行，查看主机群状态，其配置文件为postgres0.yml

python patronictl.py -c postgres0.yml list

在备用主机上查看备用集群状态

执行命令如下：python patronictl.py -c standby.yml list

启动Haproxy

systemctl start haproxy

访问Haproxy

访问地址http://192.168.137.101:1080/haproxy-stats

用户名：admin

密码：passw0rd

验证操作步骤

查看级联集群

[postgres@localhost patroni]$ python patronictl.py -d etcd://192.168.137.101:2379 list standby+ Cluster: standby (6903439046292649568) ------+---------+----+-----------+| Member        | Host            | Role           | State   | TL | Lag in MB |+---------------+-----------------+----------------+---------+----+-----------+| pgsql10_node1 | 192.168.137.105 | Standby Leader | running |  1 |           |+---------------+-----------------+----------------+---------+----+-----------+

数据读写

通过任意一台主机( centos1,centos2,centos3)访问数据库

尝试向主库(Primary)写数据，并读取数据：

psql -U postgres -d postgres -p 5432 -h centos1 postgres=# create table test (id int, name varchar(100));CREATE TABLEpostgres=# insert into test values ( 1,'1');INSERT 0 1postgres=# select * from test; id | name----+------  1 | 1(1 row)

尝试向一个从库(Standby)写数据

psql -U postgres -d postgres -p 5432 -h  centos2postgres=# insert into test values ( 1,'1');ERROR：cannot execute INSERT in a read-only transaction

尝试从一个从库(Standby)读数据

psql -U postgres -d postgres -p 5432 -h  centos3postgres=# select * from test; id | name ----+------  1 | 1(1 row)

自动切换(failover)

Kill主库(Primary)上的postmater进程，Kill之前的状态信息：

主库是postgresql0/centos1

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  1 |           || postgresql1 | 192.168.137.103 | Replica | running |  1 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  1 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

在centos1上执行Kill

[root@centos1 patroni]$ ps -ef|grep postgrespostgres  2793  2329  0 11:44 pts/0    00:00:02 python patroni.py postgres0.ymlpostgres  2847     1  0 11:44 ?        00:00:00 postgres -D /home/postgres/database/data --config-file=/home/postgres/database/data/postgresql.conf --listen_addresses=192.168.137.101 --max_worker_processes=8 --max_prepared_transactions=0 --wal_level=hot_standby --track_commit_timestamp=off --max_locks_per_transaction=64 --port=5432 --max_replication_slots=10 --max_connections=100 --hot_standby=on --cluster_name=pgcluster --wal_log_hints=on --max_wal_senders=10postgres  2850  2847  0 11:44 ?        00:00:00 postgres: pgcluster: checkpointer processpostgres  2851  2847  0 11:44 ?        00:00:00 postgres: pgcluster: writer processpostgres  2852  2847  0 11:44 ?        00:00:00 postgres: pgcluster: wal writer processpostgres  2853  2847  0 11:44 ?        00:00:00 postgres: pgcluster: autovacuum launcher processpostgres  2854  2847  0 11:44 ?        00:00:00 postgres: pgcluster: stats collector processpostgres  2859  2847  0 11:44 ?        00:00:00 postgres: pgcluster: postgres postgres 192.168.137.101(49866) idlepostgres  2873  2847  0 11:44 ?        00:00:00 postgres: pgcluster: wal sender process replicator 192.168.137.104(45866) streaming 0/4015AE8postgres  2874  2847  0 11:44 ?        00:00:00 postgres: pgcluster: wal sender process replicator 192.168.137.103(58798) streaming 0/4015AE8[root@centos1 patroni]# kill -9 2847[root@centos1 patroni]#在centos1上，Patroni再次启动postgresql0，postgresql0依然是主库，没有切换[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  2 |           || postgresql1 | 192.168.137.103 | Replica | running |  2 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  2 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

查看主库上的PID, 所有进程都是新的PID：

[root@centos1 patroni]# ps -ef|grep postgrespostgres  2793  2329  0 11:44 pts/0    00:00:03 python patroni.py postgres0.ymlpostgres  3057     1  0 11:59 ?        00:00:00 postgres -D /home/postgres/database/data --config-file=/home/postgres/database/data/postgresql.conf --listen_addresses=192.168.137.101 --max_worker_processes=8 --max_prepared_transactions=0 --wal_level=hot_standby --track_commit_timestamp=off --max_locks_per_transaction=64 --port=5432 --max_replication_slots=10 --max_connections=100 --hot_standby=on --cluster_name=pgcluster --wal_log_hints=on --max_wal_senders=10postgres  3060  3057  0 11:59 ?        00:00:00 postgres: pgcluster: checkpointer processpostgres  3061  3057  0 11:59 ?        00:00:00 postgres: pgcluster: writer processpostgres  3062  3057  0 11:59 ?        00:00:00 postgres: pgcluster: stats collector processpostgres  3067  3057  0 11:59 ?        00:00:00 postgres: pgcluster: postgres postgres 192.168.137.101(50820) idlepostgres  3072  3057  0 11:59 ?        00:00:00 postgres: pgcluster: wal sender process replicator 192.168.137.103(59732) streaming 0/4015D80postgres  3073  3057  0 11:59 ?        00:00:00 postgres: pgcluster: wal sender process replicator 192.168.137.104(46798) streaming 0/4015D80postgres  3074  3057  0 11:59 ?        00:00:00 postgres: pgcluster: wal writer processpostgres  3075  3057  0 11:59 ?        00:00:00 postgres: pgcluster: autovacuum launcher processroot      3151  2410  0 12:01 pts/1    00:00:00 grep --color=auto postgres

查看集群信息， postgresql0是主库，正常工作

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  2 |           || postgresql1 | 192.168.137.103 | Replica | running |  2 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  2 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

手工切换(switchover)

切换之前的状态信息

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  2 |           || postgresql1 | 192.168.137.103 | Replica | running |  2 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  2 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

执行手工切换(switchover)

当前的主(Primary)是: postgresql0/centos1

选择新的主(Primary): postgresql2/centos3

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 switchover pgclusterMaster [postgresql0]:Candidate ['postgresql1', 'postgresql2'] []: postgresql2When should the switchover take place (e.g. 2020-12-09T13:05 )  [now]:Current cluster topology+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  2 |           || postgresql1 | 192.168.137.103 | Replica | running |  2 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  2 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+Are you sure you want to switchover cluster pgcluster, demoting current master postgresql0? [y/N]: y2020-12-09 12:05:38.44586 Successfully switched over to "postgresql2"+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Replica | stopped |    |   unknown || postgresql1 | 192.168.137.103 | Replica | running |  2 |       0.0 || postgresql2 | 192.168.137.104 | Leader  | running |  2 |           |+-------------+-----------------+---------+---------+----+-----------+

持续查看集群状态信息：

新的主库(Primary)是：postgresql2/centos3

Patroni重新启动了postgresql0/centos1

最后, postgresql0/centos1 作为从库(Standby)重新加入集群，正常工作

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Replica | running |  3 |       0.0 || postgresql1 | 192.168.137.103 | Replica | running |  3 |       0.0 || postgresql2 | 192.168.137.104 | Leader  | running |  3 |           |+-------------+-----------------+---------+---------+----+-----------+

重新启动主库(Primay)

重新启动之前的集群信息

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Replica | running |  3 |       0.0 || postgresql1 | 192.168.137.103 | Replica | running |  3 |       0.0 || postgresql2 | 192.168.137.104 | Leader  | running |  3 |           |+-------------+-----------------+---------+---------+----+-----------+

重启动centos3(Primary database)

[root@centos3 ~]# rebootConnection to centos3 closed by remote host.Connection to centos3 closed.

查看集群状态信息

Postgresql2/centos3停止了

Postgresql0/centos1成为了主库(Primary)

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           || postgresql1 | 192.168.137.103 | Replica | running |    |   unknown || postgresql2 | 192.168.137.104 | Replica | stopped |    |   unknown |+-------------+-----------------+---------+---------+----+-----------+[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           || postgresql1 | 192.168.137.103 | Replica | running |  4 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

当centos3启动后，手工启动etcd和postgresql2

[root@centos3 ~]# systemctl start etcd[root@centos3 ~]# etcdctl --write-out=table --endpoints=192.168.137.101:2379,192.168.137.103:2379,192.168.137.104:2379 endpoint status+----------------------+------------------+---------+---------+-----------+-----------+------------+|       ENDPOINT       |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |+----------------------+------------------+---------+---------+-----------+-----------+------------+| 192.168.137.101:2379 | 38d2ef078eac1ceb |  3.3.11 |   20 kB |     false |         6 |       4474 || 192.168.137.103:2379 | 1ba6a8d4bb29c08f |  3.3.11 |   20 kB |      true |         6 |       4474 || 192.168.137.104:2379 | 16f89451865a3f1f |  3.3.11 |   20 kB |     false |         6 |       4474 |+----------------------+------------------+---------+---------+-----------+-----------+------------+[postgres@centos3 patroni]$ python patroni.py postgres2.yml

当etcd/postgresql2启动后，查看集群状态信息

Postgresql2/centos3成为从库(Standby)，正常工作

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           || postgresql1 | 192.168.137.103 | Replica | running |  4 |       0.0 || postgresql2 | 192.168.137.104 | Replica | running |  4 |       0.0 |+-------------+-----------------+---------+---------+----+-----------+

重新启动从库(Standby)

重启之前的集群信息

从库：postgresql1，postgresql2

主库：postgresql0

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+| Member      | Host            | Role    | State   | TL | Lag in MB |+-------------+-----------------+---------+---------+----+-----------+| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           |
| postgresql1 | 192.168.137.103 | Replica | running |  4 |       0.0 |
| postgresql2 | 192.168.137.104 | Replica | running |  4 |       0.0 |
+-------------+-----------------+---------+---------+----+-----------+

重启动centos2(Standby)

[root@centos2 ~]# reboot
Connection to centos2 closed by remote host.
Connection to centos2 closed.

查看集群信息

postgresql1已经停止

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster
+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+
| Member      | Host            | Role    | State   | TL | Lag in MB |
+-------------+-----------------+---------+---------+----+-----------+
| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           |
| postgresql2 | 192.168.137.104 | Replica | running |  4 |       0.0 |
+-------------+-----------------+---------+---------+----+-----------+

当centos2启动完毕后, 按照先后顺序手动启动etcd和postgresql1

[root@centos2 ~]# systemctl start etcd
[postgres@centos2 patroni]$ python patroni.py postgres1.yml

当etcd和postgresql1启动完毕后，查看集群状态

postgresql1仍然是从库，正常工作

[root@centos1 patroni]# python patronictl.py -d etcd://centos1:2379 list pgcluster
+ Cluster: pgcluster (6904097497199745789) --+---------+----+-----------+
| Member      | Host            | Role    | State   | TL | Lag in MB |
+-------------+-----------------+---------+---------+----+-----------+
| postgresql0 | 192.168.137.101 | Leader  | running |  4 |           |
| postgresql1 | 192.168.137.103 | Replica | running |  4 |       0.0 |
| postgresql2 | 192.168.137.104 | Replica | running |  4 |       0.0 |
+-------------+-----------------+---------+---------+----+-----------+