暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

PostgreSQL高可用测试系列之Patroni + etcd + HAProxy + Keepalived 离线部署(八)

原创 张玉龙 2021-11-03
2508


说明:本系列文章仅用于共享我的学习成果,未经过生产系统考验,对于知识点和一些组件的使用会不定时更新,仅供参考,如有错误的地方,欢迎留言共同学习。

本高可用系列测试不说理论知识,如有需要自行百度,因生产环境大多数是内网环境,无法连接互联网,为模拟生产环境安装,PostgreSQL高可用测试均采用离线部署

所需软件包均以打包上传百度网盘,如有需要自行下载:https://pan.baidu.com/s/1Tb7GPMvj4kfKEIh8iyvdbA 提取码:n9w2 文件名:PostgreSQL_HA.tar.gz

第一章: 介绍测试环境
第二章: PostgreSQL + replication 部署
第三章: Etcd 部署和管理
第四章: Patroni 部署和管理
第五章: HAProxy + Keepalived 部署和管理
第六章: 高可用模拟故障测试用例
第七章: Prometheus + Grafana 监控部署
第八章: 高可用管理
架构图.jpg

第八章: 高可用管理

[root@pgtest1 ~]# etcdctl endpoint status --cluster -w table +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 489460 | 489460 | | | http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 489460 | 489460 | | | http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 489460 | 489460 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ [root@pgtest1 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Leader | running | 18 | | | pgtest2 | 192.168.58.11 | Replica | running | 18 | 0 | | pgtest3 | 192.168.58.12 | Replica | running | 18 | 0 | +---------+---------------+---------+---------+----+-----------+

1. 主动维护重启服务器时软件的关闭和启动顺序

1. 计划内只重启PostgreSQL主节点(pgtest1)

需要考虑是否将数据库switchover到备节点上,在这个架构上,PostgreSQL主节点同时也是 haproxy 和 keepalived 的主节点,VIP在此节点上运行

# 1. 数据库从 pgtest1 switchover 到 pgtest2 [root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest1","candidate":"pgtest2"}' Successfully switched over to "pgtest2" # 也可以使用命令 patronictl switchover 进行数据库切换 [root@pgtest1 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Replica | running | 19 | 0 | | pgtest2 | 192.168.58.11 | Leader | running | 19 | | | pgtest3 | 192.168.58.12 | Replica | running | 19 | 0 | +---------+---------------+---------+---------+----+-----------+ # 2. pgtest1关闭 keepalived,自动将 VIP 切换到 keepalived 备节点(pgtest2),检查VIP测试业务连接正常 [root@pgtest1 ~]# systemctl stop keepalived # 3. pgtest1关闭 haproxy [root@pgtest1 ~]# systemctl stop haproxy # 4. pgtest1关闭 patroni 和 etcd,patroni 会把 PostgreSQL 自动关闭 [root@pgtest1 ~]# systemctl stop patroni [root@pgtest1 ~]# systemctl stop etcd # 5. pgtest1重启服务器 [root@pgtest1 ~]# reboot # 6. 服务器重启后自动启动 etcd patroni haproxy keepalived exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,启动 keepalived 后 VIP 会自动漂移回来,检查VIP测试业务连接正常 [root@pgtest1 ~]# etcdctl endpoint status --cluster -w table +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 473973 | 473973 | | | http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 473973 | 473973 | | | http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 473973 | 473973 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ [root@pgtest1 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Replica | running | 19 | 0 | | pgtest2 | 192.168.58.11 | Leader | running | 19 | | | pgtest3 | 192.168.58.12 | Replica | running | 19 | 0 | +---------+---------------+---------+---------+----+-----------+ [root@pgtest1 ~]# ip -4 a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.58.10/24 brd 192.168.58.255 scope global ens33 valid_lft forever preferred_lft forever inet 192.168.58.20/24 scope global secondary ens33:1 valid_lft forever preferred_lft forever # 7. 考虑是否将 PostgreSQL 主库切回 pgtest1 [root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest2","candidate":"pgtest1"}'

2. 计划内只关闭PostgreSQL备节点(pgtest2)

在这个架构上,PostgreSQL备节点(pgtest2)同时也是 haproxy 和 keepalived 的备节点

# 1. pgtest2关闭keepalived和haproxy [root@pgtest2 ~]# systemctl stop keepalived [root@pgtest2 ~]# systemctl stop haproxy # 2. pgtest2关闭 patroni 和 etcd,patroni 会把PostgreSQL自动关闭 [root@pgtest2 ~]# systemctl stop patroni [root@pgtest2 ~]# systemctl stop etcd # 3. pgtest2重启服务器 [root@pgtest2 ~]# reboot # 4. 服务器重启后自动启动 etcd patroni haproxy keepalived exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,检查数据库 [root@pgtest2 ~]# etcdctl endpoint status --cluster -w table +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 6 | 489460 | 489460 | | | http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | true | false | 6 | 489460 | 489460 | | | http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | false | false | 6 | 489460 | 489460 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ [root@pgtest2 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Leader | running | 20 | | | pgtest2 | 192.168.58.11 | Replica | running | 20 | 0 | | pgtest3 | 192.168.58.12 | Replica | running | 20 | 0 | +---------+---------------+---------+---------+----+-----------+

3. 计划内只关闭PostgreSQL备节点(pgtest3)

# 1. pgtest3关闭 patroni 和 etcd,patroni 会把PostgreSQL自动关闭 [root@pgtest3 ~]# systemctl stop patroni [root@pgtest3 ~]# systemctl stop etcd # 2. pgtest3重启服务器 [root@pgtest3 ~]# reboot # 3. 服务器重启后自动启动 etcd patroni prometheus grafana exporter(all),启动 patroni 后发现数据库没有启动,会自动拉起数据库,检查数据库 [root@pgtest3 ~]# etcdctl endpoint status --cluster -w table +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 7 | 489858 | 489858 | | | http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | false | false | 7 | 489858 | 489858 | | | http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | true | false | 7 | 489858 | 489858 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ [root@pgtest3 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Leader | running | 20 | | | pgtest2 | 192.168.58.11 | Replica | running | 20 | 0 | | pgtest3 | 192.168.58.12 | Replica | running | 20 | 0 | +---------+---------------+---------+---------+----+-----------+

4. 计划内重启其中2个服务器

因为 etcd 在3个节点的情况下,最多允许一个服务器故障,当停止2个服务器的etcd时,数据库将以只读模式运行。

5. 计划内重启所有服务器

# 1. 先关闭 Keepalived VIP,断开所有业务连接 Keepalived 备机:systemctl stop keepalived Keepalived 主机:systemctl stop keepalived # 2. 所有服务器 systemctl stop haproxy # 3. patroni 设置 Disable auto failover [root@pgtest1 ~]# patronictl pause # 4. 所有服务器 systemctl stop patroni [root@pgtest2 ~]# patronictl list +--------+------+------+-------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) ----+ +--------+------+------+-------+----+-----------+ Maintenance mode: on # 5. 所有服务器 systemctl stop etcd # 6. 关闭 PostgreSQL 主库: [root@pgtest1 ~]# su - postgres Last login: Sun Oct 31 14:17:52 CST 2021 on pts/4 [postgres@pgtest1 ~]$ pg_ctl stop -m fast -w -D $PGDATA waiting for server to shut down.... done server stopped [postgres@pgtest1 ~]$ # 7. 关闭 PostgreSQL 备库:pg_ctl stop -m fast -w -D $PGDATA # 8. 关闭所有服务器 init 0

启动所有服务器

# 1. 先启动主库服务器,自动启动 etcd patroni haproxy keepalived(VIP) exporter(all),数据库没有被拉起 # 2. 再启动两个备库服务器,节点2自动启动 etcd patroni haproxy keepalived exporter(all),节点3自动启动 etcd patroni prometheus grafana exporter(all),数据库没有被拉起 # 3. 如果不想数据库起来后直接对外服务,可以考虑关闭所有keepalived(VIP),等需要时再启用 # 4. Resume auto failover,会自动拉起所有数据库,但是 Leader 不一定起到哪个节点上了,好像是有节点权重能控制,不知道是啥 [root@pgtest1 ~]# patronictl resume Success: cluster management is resumed [root@pgtest1 ~]# patronictl list +---------+---------------+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | + Cluster: pg_cluster (7025023477017500881) --+----+-----------+ | pgtest1 | 192.168.58.10 | Replica | running | 23 | 0 | | pgtest2 | 192.168.58.11 | Leader | running | 23 | | | pgtest3 | 192.168.58.12 | Replica | running | 23 | 0 | +---------+---------------+---------+---------+----+-----------+

--------------------------未完待更----------------------------

最后修改时间:2021-11-04 09:21:05
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论