暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

人大金仓数据库KingbaseES V8R3 集群 db vip和cluster vip管理

lucky 2023-12-25
1086

案例说明:
KingbaseES V8R3集群集成了DB VIP(应用连接)和Cluster VIP(集群管理),本案例描述了两种vip在集群的相关配置及集群故障时vip漂移的问题。

适用版本:
KingbaseES V8R3

集群架构:
0.png

一、集群VIP相关配置

1)HAmodule.conf配置DB VIP和Cluster VIP

[kingbase@node101 bin]$ cat ../etc/HAmodule.conf |grep -i vip #vip is bound to the specified network card.example:DEV="ens33" #db use vip/the subnet mask.example:KB_VIP="192.168.28.220/24" KB_VIP="192.168.1.204/24" #db vip配置 #pool use vip/the subnet mask.example:KB_POOL_VIP="192.168.28.220/24" KB_POOL_VIP="192.168.1.205" #Cluster vip配置 ---集群脚本kingbase_monitor.sh在执行时,会读取HAmodule.conf中配置信息。

2)kingbasecluster.conf中Cluster vip配置

[kingbase@node101 etc]$ cat kingbasecluster.conf|grep -i 'ip add'|grep -v '#' if_up_cmd='ip addr add 192.168.1.205/24 dev enp0s3 label enp0s3:0' if_down_cmd='ip addr del 192.168.1.205/24 dev enp0s3' ---在执行kingbasecluster启动或停止集群服务时,会读取kingbasecluster.conf中的配置,加载或卸载Cluster vip。

二、集群VIP加载

1)DB VIP加载

[kingbase@node101 bin]$ ./kingbase_monitor.sh start ----------------------------------------------------------------------- 2023-02-14 19:00:25 KingbaseES automation beging... ...................... ADD VIP NOW AT 2023-02-14 19:00:33 ON enp0s3 execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2] execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1 ..... all started.. ---如上所示,执行kingbase_monitor.sh start时,DB vip被加载到集群数据库服务的主节点(Primary)上。

2)Cluster vip加载(cluster.log)

2023-02-14 19:01:00: pid 31342: LOG: kingbasecluster successfully started. version 3.6.7 (release) ....... 2023-02-14 19:01:02: pid 31449: LOG: successfully acquired the delegate IP:"192.168.1.205" 2023-02-14 19:01:02: pid 31449: DETAIL: 'if_up_cmd' returned with success ---如上,在cluster.log中显示,在kingbaseclsuter服务启动时,将读取kingbasecluster.conf配置,加载Cluster vip到集群主节点上.

3)查看主节点ip信息

[kingbase@node101 bin]$ ip add sh ...... 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.1.204/24 scope global secondary enp0s3:2 valid_lft forever preferred_lft forever inet 192.168.1.205/24 scope global secondary enp0s3:0 valid_lft forever preferred_lft forever ---如上所示,在主节点DB VIP和Cluster vip都已经被加载。

三、VIP地址漂移测试

1、Cluster vip漂移

1)模拟kingbasecluster服务down

# 查看kingbasecluster进程 [kingbase@node101 bin]$ ps -ef |grep kingbase ....... root 31342 1 0 19:00 ? 00:00:00 ./kingbasecluster -n root 31383 31342 0 19:00 ? 00:00:00 kingbasecluster: watchdog root 31450 31342 0 19:01 ? 00:00:00 kingbasecluster: lifecheck root 31452 31450 0 19:01 ? 00:00:00 kingbasecluster: heartbeat receiver root 31453 31450 0 19:01 ? 00:00:00 kingbasecluster: heartbeat sender root 31456 31342 0 19:01 ? 00:00:00 kingbasecluster: wait for connection request root 31457 31342 0 19:01 ? 00:00:00 kingbasecluster: wait for connection request ....... # kill kingbasecluster进程 [root@node101 ~]# kill -2 31342

2)查看集群节点vip信息

#原主节点 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.1.204/24 scope global secondary enp0s3:2 valid_lft forever preferred_lft forever #原备节点 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.1.205/24 scope global secondary enp0s3:0 valid_lft forever preferred_lft forever ---如上所示,在主机点Cluster vip(192.168.1.205)已经被卸载,被加载到原备节点; 但是DB vip没有发生漂移,不影响应用对数据库服务的访问。

3)查看cluster.log日志

Tips:

如下所示,由于主节点kingbasecluster服务被停止,备库kingbasecluster服务切换为新主节点,Cluster vip漂移到了新的kingbasecluster主节点。

2023-02-14 19:02:51: pid 7497: LOG: We have lost the cluster master node "192.168.1.101:9999 Linux node101" 2023-02-14 19:02:51: pid 7497: LOG: watchdog node state changed from [STANDBY] to [JOINING] ...... 2023-02-14 19:02:56: pid 7497: LOG: watchdog node state changed from [INITIALIZING] to [MASTER] 2023-02-14 19:02:56: pid 7497: LOG: I am announcing my self as master/coordinator watchdog node 2023-02-14 19:02:59: pid 7500: LOG: watchdog checking if kingbasecluster is alive using heartbeat 2023-02-14 19:02:59: pid 7500: DETAIL: the last heartbeat from "192.168.1.101:9999" received 8 seconds ago 2023-02-14 19:03:00: pid 7497: LOG: I am the cluster leader node 2023-02-14 19:03:00: pid 7497: DETAIL: our declare coordinator message is accepted by all nodes ........ 2023-02-14 19:03:02: pid 8176: LOG: selecting backend connection 2023-02-14 19:03:02: pid 8176: DETAIL: failback event detected, discarding existing connections 2023-02-14 19:03:02: pid 7500: LOG: watchdog checking if kingbasecluster is alive using heartbeat 2023-02-14 19:03:02: pid 7500: DETAIL: the last heartbeat from "192.168.1.101:9999" received 11 seconds ago 2023-02-14 19:03:02: pid 9330: LOG: successfully acquired the delegate IP:"192.168.1.205" 2023-02-14 19:03:02: pid 9330: DETAIL: 'if_up_cmd' returned with success

4)重启原主节点的kingbasecluster服务

#启动kingbasecluster服务 [root@node101 ~]# cd /home/kingbase/cluster/HAR3/kingbasecluster/bin [root@node101 bin]# ./restartcluster.sh

Tips:

如下所示,原主节点在启动kingbasecluster服务后,做为standby节点加入集群。

#cluster.log: 2023-02-14 19:03:05: pid 1023: LOG: watchdog node state changed from [DEAD] to [LOADING] 2023-02-14 19:03:05: pid 1023: LOG: new outbound connection to 192.168.1.102:9000 2023-02-14 19:03:05: pid 1023: LOG: setting the remote node "192.168.1.102:9999 Linux node102" as watchdog cluster master 2023-02-14 19:03:05: pid 1023: LOG: watchdog node state changed from [LOADING] to [INITIALIZING] 2023-02-14 19:03:05: pid 1023: LOG: new watchdog node connection is received from "192.168.1.102:47600" 2023-02-14 19:03:05: pid 1023: LOG: new node joined the cluster hostname:"192.168.1.102" port:9000 kingbasecluster_port:9999 2023-02-14 19:03:06: pid 1023: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY] 2023-02-14 19:03:06: pid 1023: LOG: successfully joined the watchdog cluster as standby node

2、DB VIP漂移

1)模拟主库数据库服务down

[kingbase@node101 bin]$ ./sys_ctl stop -D /home/kingbase/cluster/HAR3/db/data

2)查看failover.log日志

-----------------2023-02-14 19:23:52 failover beging--------------------------------------- ----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/HAR3/db/data]. ----ping trust ip ping trust ip 192.168.1.1 success ping times :[3], success times:[3] ----determine whether the faulty db is master or standby master down, let 192.168.1.102 become new primary..... 2023-02-14 19:23:54 del old primary VIP on 192.168.1.101 es_client connect host:192.168.1.101 success, will stop old primary db and del the vip stop the old primary db sys_ctl: PID file "/home/kingbase/cluster/HAR3/db/data/kingbase.pid" does not exist Is server running? DEL VIP NOW AT 2023-02-14 19:23:56 ON enp0s3 execute: [/sbin/ip addr del 192.168.1.204/24 dev enp0s3] Oprate del ip cmd end. 2023-02-14 19:23:54 add VIP on 192.168.1.102 ADD VIP NOW AT 2023-02-14 19:23:55 ON enp0s3 execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2] execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1 Success to send 1 packets 2023-02-14 19:23:55 promote begin...let 192.168.1.102 become master ....... -----------------2023-02-14 19:23:55 failover end--------------------------------------- ---如上所示,failover切换过程中,DB VIP将从原主库卸载,新主库加载。

3)查看新主库ip信息

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.1.205/24 scope global secondary enp0s3:0 valid_lft forever preferred_lft forever inet 192.168.1.204/24 scope global secondary enp0s3:2 ---如上所示,在集群触发failover切换后,DB VIP漂移到新的primary节点。

四、总结

KingbaseES V8R3集群通过vip地址,实现应用对数据库高可用性的连接访问及集群管理。
1)DB VIP用于应用的连接访问,在启动集群时被加载到数据库服务的主节点(Primary),当主节点数据库服务down机,触发failover切换时,DB VIP漂移到新的数据库服务主节点。
2)Cluster vip用于kingbasecluster服务的访问,集群启动时加载到kingbaseclsuter的Master节点,当master节点的kingbasecluster服务down时,会漂移到新的master节点。
3)在生产环境出现不能访问9999端口(kingbasecluster服务端口)时,可以尝试重启kingbasecluster服务,默认是不影响客户端的连接访问数据库服务;但对于生产环境,最好是在应用访问的低峰时间执行。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论