暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PGPool II 的HA方案 watch dog

原创 大表哥 2022-06-30
2039

image.png

大家好,这次大表哥带来技术分享是 PGPOOL II 这个连接池自身的HA方案: Watch dog.

本文参考官网的 configuration Examples:

https://tatsuo-ishii.github.io/pgpool-II/current/example-cluster.html

Watch dog 是PGPool 的内置的监控进程不需要独立安装,打开相关的参数开关即可。

具体的配置步骤如下:

1)官方建议配置至少3台(奇数个)几点配置Watch dog 进程。

image.png

IP pg pool role pg instance role
10.67.38.50 LEADER primary database
10.67.39.149 STANDBY standby database
10.67.39.49 STANDBY standy database

2)手动创建 pgpool_node_id 文件, 在路径下:${PGPOOL_HOME}/etc
这一步是必须的,否则启动会报错:If watchdog is enable, pgpool_node_id file is required

INFRA [root@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool 2022-06-30 14:53:25.867: main pid 69061: 2022-06-30 14:53:25FATAL: Pgpool node id file /opt/postgreSQL/pgpool4.3/etc/pgpool_node_id does not exist 2022-06-30 14:53:25.867: main pid 69061: 2022-06-30 14:53:25DETAIL: If watchdog is enable, pgpool_node_id file is required
INFRA [postgres@wqdcsrv3352 pgpool4.3]# vi pgpool_node_id INFRA [postgres@wqdcsrv3352 pgpool4.3]# cat pgpool_node_id 0 INFRA [postgres@wqdcsrv3354 pgpool]# vi pgpool_node_id INFRA [postgres@wqdcsrv3354 pgpool]# cat pgpool_node_id 1 INFRA [postgres@wqdcsrv3353 pgpool]# vi pgpool_node_id INFRA [postgres@wqdcsrv3353 pgpool]# cat pgpool_node_id 2

3)关于watch dog 核心参数的配置: 配置文件 ${PGPOOL_HOME}/etc/pgpool.conf

#------------------------------------------------------------------------------ # WATCHDOG #------------------------------------------------------------------------------ ## watch dog的开关 use_watchdog = on ## watch 的节点 IP,PORT 信息 hostname0 = '10.67.38.50' wd_port0 = 9000 pgpool_port0 = 9999 hostname1 = '10.67.39.149' wd_port1 = 9000 pgpool_port1 = 9999 hostname2 = '10.67.39.49' wd_port2 = 9000 pgpool_port2 = 9999 ### 优先级的设置 这里我们设置成一致的 为1 , ### 没有设置加密方式 ### 进程文件设置在了 /tmp下面 wd_priority = 1 wd_authkey = '' wd_ipc_socket_dir = '/tmp' ###VIP相关的设置 需要根据实际的 网卡设备名称 eth0 ### 这个我们PGPOOL 是通过 PG POOL 启动的,所以 不需要 sudo 可以直接执行, 如果是 postgres等普通权限的用户,需要添加 /sbin/sudo delegate_IP = '10.67.39.200' if_up_cmd = '/sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0' if_down_cmd = '/sbin/ip addr del $_IP_$/24 dev eth0' arping_cmd = '/usr/sbin/arping -U $_IP_$ -w 1 -I eth0' wd_escalation_command = '/opt/postgreSQL/pgpool4.3/etc/escalation.sh' ### 这个脚本的执行发生在挂上VIP之前,需要把所有的节点VIP 卸载一遍,以防止脑裂的发生 ###watch dog 的心跳方式配置: ###通过网卡端口监控, 10秒为超时 wd_lifecheck_method = 'heartbeat' wd_interval = 10 heartbeat_hostname0 = '10.67.38.50' heartbeat_port0 = 9694 heartbeat_device0 = 'eth0' heartbeat_hostname1 = '10.67.39.149' heartbeat_port1 = 9694 heartbeat_device1 = 'eth0' heartbeat_hostname2 = '10.67.39.49' heartbeat_port2 = 9694 heartbeat_device2 = 'eth0' wd_heartbeat_keepalive = 2 wd_heartbeat_deadtime = 30
  1. 需要修改一下 escalation.sh 这个shell 文件

修改成实际节点的IP和VIP

#!/bin/bash # This script is run by wd_escalation_command to bring down the virtual IP on other pgpool nodes # before bringing up the virtual IP on the new active pgpool node. set -o xtrace PGPOOLS=(10.67.38.50 10.67.39.149 10.67.39.49) HOSTNAME=`hostname -i` VIP=10.67.39.200 DEVICE=eth0 for pgpool in "${PGPOOLS[@]}"; do [ "$HOSTNAME" = "$pgpool" ] && continue ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$pgpool -i ~/.ssh/id_rsa_pgpool " /usr/bin/sudo /sbin/ip addr del $VIP/24 dev $DEVICE " done exit 0

5)依次启动3个节点的 watch dog (这里我们用大权限的账户 root ), 观察 watch dog的 节点状态

INFRA [root@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [root@wqdcsrv3353 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [root@wqdcsrv3354 etc]# /opt/postgreSQL/pgpool4.3/bin/pgpool INFRA [postgres@wqdcsrv3352 etc]# /opt/postgreSQL/pgpool4.3/bin/pcp_watchdog_info -h localhost -p 9898 -U postgres Password: 3 3 YES 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 4 LEADER 0 MEMBER 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 7 STANDBY 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER

6)手动测试VIP 漂移
我们尝试关闭 LEADER 节点上的PGPOOL : 10.67.38.50

INFRA [root@wqdcsrv3352 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -m fast stop 2022-06-29 17:34:33.552: main pid 76158: 2022-06-29 17:34:33LOG: stop request sent to pgpool (pid: 73720). waiting for termination... .done.

我们可以从pgpool 的日志中,看到VIP已经被移除了

2022-06-29 17:34:33.552: main pid 73720: 2022-06-29 17:34:33LOG: terminating all child processes 2022-06-29 17:34:33.556: watchdog pid 73723: 2022-06-29 17:34:33LOG: Watchdog is shutting down 2022-06-29 17:34:33.556: watchdog_utility pid 76159: 2022-06-29 17:34:33LOG: watchdog: de-escalation started 2022-06-29 17:34:33.558: watchdog_utility pid 76159: 2022-06-29 17:34:33LOG: successfully released the delegate IP:"10.67.39.200" 2022-06-29 17:34:33.558: watchdog_utility pid 76159: 2022-06-29 17:34:33DETAIL: 'if_down_cmd' returned with success 2022-06-29 17:34:33.559: main pid 73720: 2022-06-29 17:34:33LOG: Pgpool-II system is shutdown
INFRA [root@wqdcsrv3352 ~]# ip addr | grep 10.67.39.200

我们这个时候发现VIP 10.67.39.200 已经漂移到了节点 149:

INFRA [root@wqdcsrv3354 ~]# ip addr |grep 10.67.39.200 inet 10.67.39.200/24 scope global eth0:0

此时我们再次观察PGPOOL 集群的状态:

INFRA [root@wqdcsrv3354 ~]# /opt/postgreSQL/pgpool4.3/bin/pcp_watchdog_info -h localhost -p 9898 -U postgres Password: 3 3 YES 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 4 LEADER 0 MEMBER 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 10 SHUTDOWN 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER

我们手动起来 10.67.38.50 节点上的 pgpool:

INFRA [root@wqdcsrv3352 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -f /opt/postgreSQL/pgpool4.3/etc/pgpool.conf -F /opt/postgreSQL/pgpool4.3/etc/pcp.conf > /tmp/pgpool.log

这个时候10.67.38.50 的节点 恢复成了 standby 的状态:

3 3 YES 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 10.67.39.149:9999 Linux wqdcsrv3354.cn.infra 10.67.39.149 9999 9000 4 LEADER 0 MEMBER 10.67.38.50:9999 Linux wqdcsrv3352.cn.infra 10.67.38.50 9999 9000 7 STANDBY 0 MEMBER 10.67.39.49:9999 Linux wqdcsrv3353.cn.infra 10.67.39.49 9999 9000 7 STANDBY 0 MEMBER

下面我们来用springboot的小程序测试一下 PGPOOL 的HA :

我们的程序的数据库配置是 这次是连接到 VIP 10.67.39.200 上面
连接池的配置是 最大允许 200 个连接,最小空闲的连接数是10

//数据库的地址以及端口号 spring.datasource.url=jdbc:postgresql://10.67.39.200:9999/postgres spring.datasource.username=app_ha_user spring.datasource.password=app_ha_user spring.datasource.driverClassName=org.postgresql.Driver spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect spring.jpa.properties.hibernate.hbm2ddl.auto=update spring.datasource.maximum-pool-size=200 spring.datasource.min-idle=10

测试生成很简单,就是网页输出一下数据库的IP地址:

String sql = "select inet_server_addr()"; // 通过jdbcTemplate查询数据库 String hostname = (String)jdbcTemplate.queryForObject( sql, String.class); return "Hello ,you are connecting to " + hostname;

启动测试程序后,观察初始化的连接池:

postgres=> show pool_processes; pool_pid | start_time | client_connection_count | database | username | backend_connection_time | pool_counter | status ----------+------------------------------------------------------+-------------------------+----------+-------------+-------------------------+--------------+--------------------- 80665 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80666 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 64166 | 2022-06-30 11:11:51 | 2 | | | | | Wait for connection 80668 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80669 | 2022-06-29 17:30:21 | 1 | | | | | Wait for connection 65854 | 2022-06-30 11:27:54 | 0 | | | | | Wait for connection 80671 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80672 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80673 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80674 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80675 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80676 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80677 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80678 | 2022-06-29 17:30:21 | 1 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80679 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:50 | 1 | Idle 80680 | 2022-06-29 17:30:21 (4:15 before process restarting) | 1 | | | | | Wait for connection 80681 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80682 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80683 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80684 | 2022-06-29 17:30:21 | 2 | | | | | Wait for connection 80685 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80686 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80687 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80688 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80689 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80690 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80691 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80692 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80693 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:29:51 | 1 | Idle 80694 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection 80695 | 2022-06-29 17:30:21 | 0 | postgres | app_ha_user | 2022-06-30 11:30:30 | 1 | Execute command 80696 | 2022-06-29 17:30:21 | 0 | | | | | Wait for connection

尝试访问web界面: http://127.0.0.1:8066/ 我们可以看到 50这个节点是主库

Image.png

关闭VIP所在节点的PGPOOL :

INFRA [root@wqdcsrv3354 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:ae:f7:e7 brd ff:ff:ff:ff:ff:ff inet 10.67.39.149/22 brd 10.67.39.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.67.39.200/24 scope global eth0:0 valid_lft forever preferred_lft forever INFRA [root@wqdcsrv3354 ~]# /opt/postgreSQL/pgpool4.3/bin/pgpool -m fast stop 2022-06-30 11:34:08.821: main pid 66716: 2022-06-30 11:34:08LOG: stop request sent to pgpool (pid: 80655). waiting for termination... .done.

这个时候我们看到VIP 漂移到了 49这个节点:

INFRA [postgres@wqdcsrv3353 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:ae:99:07 brd ff:ff:ff:ff:ff:ff inet 10.67.39.49/22 brd 10.67.39.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.67.39.200/24 scope global eth0:0 valid_lft forever preferred_lft forever

再次访问网页: http://127.0.0.1:8066/ 我们可以看到通过VIP 依然可以连接到 主库 50 这个节点.

Image.png

我们可以观察到 测试程序的连接池进行了重连

2022-06-30 11:39:01,161 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@51849c83 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,164 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@46432d85 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,168 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@33d6c14a (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,169 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@64116a2c (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,169 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@6641e508 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,170 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3931c340 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,170 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3ce889bd (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,171 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@5347c406 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,172 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3a4712c3 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2022-06-30 11:39:01,172 WARN (PoolBase.java:184)- HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@ddbed08 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论