暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PG 高可用之patroni

zhou 2024-04-22
1203

pg 架构之Patroni

元数据记录etcd,真正的DCS系统,避免脑裂

使用 etcd+  Patroni+haproxy+keepalived 搭建PG集群

环境准备
192.168.0.211 node01
192.168.0.212 node02
192.168.0.213 node03
PG主从

etcd安装配置

在node01,node02,node03上执行

yum install -y gcc python-devel epel-release
yum clean all
yum makecache
yum -y install etcd

etcd配置

修改对应的ETCD_NAME, 对应的IP

cp /etc/etcd/etcd.conf /etc/etcd/etcd.conf.bak202404
cat > /etc/etcd/etcd.conf<<here
ETCD_NAME="node01"
ETCD_DATA_DIR="/var/lib/etcd/node01.etcd"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://192.168.0.211:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://127.0.0.1:2379,http://192.168.0.211:2379"
ETCD_LISTEN_PEER_URLS="http://192.168.0.211:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.211:2380"
ETCD_INITIAL_CLUSTER="node01=http://192.168.0.211:2380,node02=http://192.168.0.212:2380,node03=http://192.168.0.213:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
here

etcd自动启动

systemctl enable etcd
systemctl start etcd

查看etcd状态

etcdctl member list
Partoni安装配置

python安装

#安装python3.8
yum clean all
yum makecache
yum install -y libffi-devel libuuid libuuid-devel
wget https://www.python.org/ftp/python/3.8.19/Python-3.8.19.tgz
tar -xzvf Python-3.8.19.tgz
cd Python-3.8.19
./configure
make -j 7
make -j 7 install
rm -rf /usr/bin/python
ln -s /usr/local/bin/python3.8 /usr/bin/python 
rm -rf /usr/bin/python
ln -s /usr/local/bin/python3 /usr/bin/pyhon

#修正yum指向python2
vi /usr/bin/yum
修改#!/usr/bin/python 为#!/usr/bin/python2
vi /usr/libexec/urlgrabber-ext-down  
修改#!/usr/bin/python 为#!/usr/bin/python2

Partoni安装

yum install -y libffi-devel 
yum install -y gcc epel-release
yum install -y python-pip python-psycopg2 python-devel
pip3 install --upgrade pip
pip3 install --upgrade setuptools
pip3 install psycopg2-binary
pip3 install patroni[etcd]
pip3 install urllib3==1.26.15
pip3 list

Partoni配置

在node01,node02,node03上执行,替换为对应的ip,name

mkdir /etc/patroni
#用户密码延用repmgr 
cat > /etc/patroni/patroni.yml <<here
scope: etcd_patroni_pg
namespace: /pgcluster1/
name: node01

restapi:
  listen: 192.168.0.211:8008
  connect_address: 192.168.0.211:8008

etcd:
  host: 192.168.0.211:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        listen_addresses: "0.0.0.0"
        port: 5432
        wal_level: logical
        hot_standby: "on"
        wal_keep_segments: 1000
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: gzip < %p > /data/backup/pgwalarchive/%f.gz
#      recovery_conf:
#        restore_command: gunzip < /data/backup/pgwalarchive/%f.gz > %p

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.0.211:5432
  data_dir: /data/pgdata
  bin_dir: /usr/local/pgsql-15.6/bin
  authentication:
    replication:
      username: repmgr
      password: repmgr
    superuser:
      username: postgres
      password: postgres

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
here
chown -R postgres.postgres -R /etc/patroni/ 

patroni.service服务设置

vi /etc/systemd/system/patroni.service
[Unit]
Description=patroni - a high-availability PostgreSQL
Documentation=https://patroni.readthedocs.io/en/latest/index.html
After=syslog.target network.target etcd.target
Wants=network-online.target
[Service]
Type=simple
User=postgres
Group=postgres
PermissionsStartOnly=true
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
KillMode=process
KillSignal=SIGINT
Restart=on-abnormal
RestartSec=30s
TimeoutSec=0
[Install]
WantedBy=multi-user.target

开机自动启动

systemctl enable patroni.service
systemctl start patroni.service
检查集群状态

查看集群

patronictl -c /etc/patroni/patroni.yml list

查看主从

select client_addr,sync_state from pg_stat_replication;
  client_addr  | sync_state
---------------+------------
 192.168.0.213 | async
 192.168.0.212 | async

SELECT * FROM pg_stat_replication;
集群切换
手工在线切换
patronictl -c /etc/patroni/patroni.yml switchover


20240422194739image.png

20240422194759image.png
自动启动原主库,跟随时间线启动为从库


20240422195323image.png

手动停库
 pg_ctl stop -D /data/pgdata

patroni自动拉起PG,并保持leader角色(crashed -> running)

Apr 22 21:21:08 node01 patroni: 2024-04-22 21:21:08,699 INFO: no action. I am (node01), the leader with the lock
Apr 22 21:21:12 node01 patroni: 2024-04-22 21:21:12,653 WARNING: Postgresql is not running.
……

Apr 22 21:21:12 node01 patroni: 2024-04-22 21:21:12,791 INFO: starting primary after failure
Apr 22 21:21:14 node01 patroni: 2024-04-22 21:21:14,049 INFO: postmaster pid=1752
Apr 22 21:21:14 node01 patroni: 2024-04-22 21:21:14.164 CST [1752] LOG:  redirecting log output to logging collector process
Apr 22 21:21:14 node01 patroni: 2024-04-22 21:21:14.164 CST [1752] HINT:  Future log output will appear in directory "log".
Apr 22 21:21:15 node01 patroni: localhost:5432 - accepting connections
Apr 22 21:21:17 node01 patroni: localhost:5432 - accepting connections
Apr 22 21:21:17 node01 patroni: 2024-04-22 21:21:17,673 INFO: Lock owner: node01; I am node01
Apr 22 21:21:17 node01 patroni: 2024-04-22 21:21:17,673 INFO: establishing a new patroni heartbeat connection to postgres
Apr 22 21:21:19 node01 patroni: 2024-04-22 21:21:19,760 INFO: no action. I am (node01), the leader with the lock

crashed -> running


20240422212504image.png

20240422212554image.png

reboot操作

reboot 重启机器,自动切主

#reboot master


20240422212655image.png
主机重启后,集群发现replica stopped-> 切换时间线TL 选新主


20240422212743image.png
tail -f /var/log/mesasges

Apr 22 21:26:49 node02 patroni: conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Apr 22 21:26:49 node02 patroni: psycopg2.OperationalError: connection to server at "192.168.0.211", port 5432 failed: FATAL:  the database system is shutting down
Apr 22 21:26:49 node02 patroni: 2024-04-22 21:26:49,760 INFO: no action. I am (node02), a secondary, and following a leader (node01)
Apr 22 21:27:28 node02 patroni: 2024-04-22 21:27:26,924 INFO: Lock owner: node03; I am node02
Apr 22 21:27:28 node02 patroni: 2024-04-22 21:27:28,644 INFO: Local timeline=6 lsn=0/180001F0
Apr 22 21:27:28 node02 patroni: 2024-04-22 21:27:28,696 ERROR: Exception when working with leader
Apr 22 21:27:28 node02 patroni: Traceback (most recent call last):
Apr 22 21:27:28 node02 patroni: psycopg2.OperationalError: connection to server at "192.168.0.213", port 5432 failed: FATAL:  no pg_hba.conf entry for host "192.168.0.212", user "postgres", database "postgres", no encryption
Apr 22 21:27:28 node02 patroni: 2024-04-22 21:27:28,751 INFO: no action. I am (node02), a secondary, and following a leader (node03)
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论