暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

MogHA一主一同步备用例测试

原创 何放 2021-12-25
852

MogHA测试一主一同步备做测试,MogHA使用的lite模式。MogHA常用测试有主机宕机情况模拟、备机宕机情况模拟、主库异常关闭模拟、备库异常关闭模拟、脑裂判断。如需查看HA搭建返回上一篇。(MogDB一主一同步备基础上搭建MogHA)

主备节点基本信息

主机IP:192.168.134.132 主机心跳IP:192.168.134.130 主机名:node1
备机IP:192.168.134.134 备机心跳IP:192.168.134.131 备机名:node2
VIP:192.168.134.133

用例1:主机宕机

主机执行停机。

# 实际模拟操作如下,主机root执行关机操作
shutdown -h now

备库mogha_heartbeat.log消息,备库执行failover升为主,VIP到新主库。

2021-12-24 15:56:00,420 - heartbeat.standby - INFO [standby_heartbeat.py:85]: primary lost check...
2021-12-24 15:56:02,422 - heartbeat.standby - ERROR [standby_heartbeat.py:114]: primary lost check :2s
2021-12-24 15:56:05,424 - heartbeat.standby - ERROR [standby_heartbeat.py:114]: primary lost check :5s
2021-12-24 15:56:08,426 - heartbeat.standby - ERROR [standby_heartbeat.py:114]: primary lost check :8s
2021-12-24 15:56:11,428 - heartbeat.standby - ERROR [standby_heartbeat.py:114]: primary lost check :11s
2021-12-24 15:56:12,433 - heartbeat.standby - INFO [standby_heartbeat.py:228]: Start failover...
2021-12-24 15:56:12,542 - heartbeat.standby - INFO [standby_heartbeat.py:232]: Alter system set most_available_sync on
2021-12-24 15:56:12,671 - heartbeat.standby - INFO [standby_heartbeat.py:235]: Start gs_ctl failover... now lsn:0/53BDA58
2021-12-24 15:56:15,610 - heartbeat.standby - INFO [standby_heartbeat.py:238]: Failover result:
out: [2021-12-24 15:56:12.710][27193][][gs_ctl]: gs_ctl failover ,datadir is /dbdata/data 
[2021-12-24 15:56:12.710][27193][][gs_ctl]: failover term (1)
[2021-12-24 15:56:12.715][27193][][gs_ctl]:  waiting for server to failover...
.[2021-12-24 15:56:15.088][27193][][gs_ctl]:  done
[2021-12-24 15:56:15.093][27193][][gs_ctl]:  failover completed (/dbdata/data)

err:
2021-12-24 15:56:16,387 - heartbeat.standby - INFO [standby_heartbeat.py:247]: Already become primary
2021-12-24 15:56:16,387 - heartbeat.standby - INFO [standby_heartbeat.py:248]: Start change VIP...
2021-12-24 15:56:20,236 - heartbeat.standby - INFO [standby_heartbeat.py:250]: End change VIP...
2021-12-24 15:56:20,240 - heartbeat.standby - INFO [standby_heartbeat.py:251]: Failover success
2021-12-24 15:56:20,243 - heartbeat.standby - INFO [standby_heartbeat.py:165]: Writing primary info to /dbdata/app/mogha/primary_info ...
2021-12-24 15:56:20,255 - heartbeat.standby - INFO [standby_heartbeat.py:173]: Write meta file success
2021-12-24 15:56:20,262 - heartbeat.loop - INFO [loop.py:175]: [20001] normal break: Failover success. restart heartbeat.

查看新主库状态,及VIP情况

[omm@node2 ~]$ gs_ctl -D /dbdata/data query
[2021-12-24 15:56:50.154][35826][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
No information 
[omm@node2 ~]$ ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:c7:0a:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.134.134/24 brd 192.168.134.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.134.133/24 brd 192.168.134.255 scope global secondary ens33:1

主机重启后执行。

# 主机执行build,以新备库身份加入集群
gs_ctl -D /dbdata/data build

# 查看主备集群状态
[omm@node2 ~]$ gs_ctl -D /dbdata/data query
[2021-12-24 15:58:54.598][36400][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 36399
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/6388DB8
        sender_write_location          : 0/6388DB8
        sender_flush_location          : 0/6388DB8
        sender_replay_location         : 0/6388DB8
        receiver_received_location     : 0/6388DB8
        receiver_write_location        : 0/6388DB8
        receiver_flush_location        : 0/6388DB8
        receiver_replay_location       : 0/63888E8
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.134.134:51001-->192.168.134.132:21322

 Receiver info:      
No information

用例2:备机宕机

查看MogHA配置文件node.conf。

# [v2.1新增] 备库进程未启动时,是否需要 HA 进行拉起 handle_down_standby=True # 主备同时修改,并重启MogHA

备机执行关机。

# 实际模拟操作如下,备机root执行关机操作
shutdown -h now

主库mogha_heartbeat.log发现找不着备库

2021-12-24 16:28:48,502 - heartbeat.loop - INFO [toolkit.py:180]: Ping result:{'192.168.134.2': True, '192.168.134.132': False, '192.168.134.130': False}
2021-12-24 16:28:48,532 - heartbeat.loop - INFO [loop.py:148]: Detect that local instance is active primary
2021-12-24 16:28:48,532 - heartbeat.primary - INFO [primary_heartbeat.py:37]: primary lonely check...
2021-12-24 16:28:50,534 - heartbeat.primary - INFO [primary_heartbeat.py:263]: double primary check...
2021-12-24 16:28:56,568 - instance - INFO [opengauss.py:211]: VIP: 192.168.134.133 already set in local host: ['192.168.134.134', '192.168.134.133', '192.168.134.131', '192.168.122.1']
2021-12-24 16:28:56,568 - heartbeat.primary - INFO [primary_heartbeat.py:353]: sync primary info...
2021-12-24 16:29:02,573 - heartbeat.primary - INFO [primary_heartbeat.py:400]: standby check...
2021-12-24 16:29:02,597 - heartbeat.loop - ERROR [loop.py:178]: Heartbeat failed:
2021-12-24 16:29:02,597 - heartbeat.loop - ERROR [loop.py:179]: not found any sync backup instance

查看主库状态,及VIP状态

[omm@node2 ~]$ gs_ctl -D /dbdata/data query
[2021-12-25 13:55:23.443][36393][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
No information 
 Receiver info:      
No information 
[omm@node2 ~]$ ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:c7:0a:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.134.134/24 brd 192.168.134.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet 192.168.134.133/24 brd 192.168.134.255 scope global secondary ens33:1

备机启动,mogha_heartbeat.log可看到MogHA会检测到备库把停止的备库启动。

2021-12-24 17:54:42,745 - root - INFO [config.py:153]: config load successfully
2021-12-24 17:54:42,745 - root - INFO [main.py:54]: version: 2.2.2
2021-12-24 17:54:42,985 - heartbeat.loop - INFO [loop.py:50]: Detect that local instance is down, try to handle it
2021-12-24 17:54:42,986 - heartbeat.primary - ERROR [primary_heartbeat.py:224]: local primary:192.168.134.134 is not local host:192.168.134.132.
2021-12-24 17:54:42,986 - heartbeat.loop - INFO [loop.py:85]: Local instance is stopped standby
2021-12-24 17:54:42,986 - heartbeat.loop - INFO [opengauss.py:249]: VIP:192.168.134.133 already offline, local ip list: ['192.168.134.132', '192.168.134.130', '192.168.122.1']
2021-12-24 17:55:01,019 - heartbeat.loop - INFO [opengauss.py:279]: Instance start:
out:
[2021-12-24 17:54:43.049][1776][][gs_ctl]: gs_ctl started,datadir is /dbdata/data
[2021-12-24 17:54:44.385][1776][][gs_ctl]: waiting for server to start...

备库启动还需执行。

# 备库执行build,加入集群
gs_ctl -D /dbdata/data build

# 查看主备集群状态
[omm@node2 ~]$  gs_ctl -D /dbdata/data query
[2021-12-24 17:55:34.664][41007][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 36399
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/6389AD8
        sender_write_location          : 0/6389AD8
        sender_flush_location          : 0/6389AD8
        sender_replay_location         : 0/6389AD8
        receiver_received_location     : 0/6389AD8
        receiver_write_location        : 0/6389AD8
        receiver_flush_location        : 0/6389AD8
        receiver_replay_location       : 0/6389AD8
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.134.134:51001-->192.168.134.132:21322

 Receiver info:      
No information 

用例3:主库进程异常

查看MogHA配置文件node.conf。

# [v2.1新增] 主实例进程未启动时,是否需要 HA 进行拉起或切换 # 搭配 primary_down_handle_method 使用 handle_down_primary=True primary_down_handle_method=restart # 主备同时修改,并重启MogHA

命令行kill掉主库进程。

# 实际模拟操作如下,查询主库进程,执行kill -9 进程号
[omm@node1 ~]$ ps -ef|grep mogdb
omm        3143      1  4 11:54 pts/0    00:00:16 /dbdata/app/mogdb/bin/mogdb -D /dbdata/data -M primary
omm        5936   5748  0 12:01 pts/0    00:00:00 grep --color=auto mogdb

[omm@node1 ~]$ kill -9 3143

观察主机mogha_heartbeat.log日志,MogHA把异常的主库重新启起来,总耗时5s。

2021-12-25 12:23:17,726 - heartbeat.loop - INFO [loop.py:50]: Detect that local instance is down, try to handle it
2021-12-25 12:23:17,731 - heartbeat.loop - INFO [loop.py:55]: Local instance is stopped primary
2021-12-25 12:23:17,731 - heartbeat.loop - INFO [loop.py:69]: disk check success, try to restart
2021-12-25 12:23:17,732 - heartbeat.primary - INFO [primary_heartbeat.py:518]: try to restart local instance, count: 1
2021-12-25 12:23:21,266 - instance - INFO [opengauss.py:279]: Instance start:
out:
[2021-12-25 12:23:17.758][14623][][gs_ctl]: gs_ctl started,datadir is /dbdata/data 
[2021-12-25 12:23:17.896][14623][][gs_ctl]: waiting for server to start...
[2021-12-25 12:23:21.265][14623][][gs_ctl]:  done
[2021-12-25 12:23:21.265][14623][][gs_ctl]: server started (/dbdata/data)

查看主库状态,集群状态完全正常。

[omm@node1 ~]$ ps -ef|grep mogdb
omm       14626      1 15 12:23 ?        00:00:03 /dbdata/app/mogdb/bin/mogdb -D /dbdata/data -M primary
omm       14806   5748  0 12:23 pts/0    00:00:00 grep --color=auto mogdb
[omm@node1 ~]$ gs_ctl -D /dbdata/data/ query
[2021-12-25 12:29:51.611][17255][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 14686
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/5F9AD80
        sender_write_location          : 0/5F9AD80
        sender_flush_location          : 0/5F9AD80
        sender_replay_location         : 0/5F9AD80
        receiver_received_location     : 0/5F9AD80
        receiver_write_location        : 0/5F9AD80
        receiver_flush_location        : 0/5F9AD80
        receiver_replay_location       : 0/5F9AD80
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.134.132:51001-->192.168.134.134:63740

 Receiver info:      
No information 

用例4:备库进程异常

查看MogHA配置文件node.conf。

# [v2.1新增] 备库进程未启动时,是否需要 HA 进行拉起 handle_down_standby=True # 主备同时修改,并重启MogHA

命令行kill掉备库进程。

# 实际模拟操作如下,查询主库进程,执行kill -9 进程号
[omm@node2 ~]$ ps -ef|grep mogdb
omm        2657      1  3 11:55 ?        00:01:19 /dbdata/app/mogdb/bin/mogdb -D /dbdata/data -M standby
omm       17066  17025  0 12:37 pts/0    00:00:00 grep --color=auto mogdb

[omm@node2 ~]$ kill -9 2657

观察备机mogha_heartbeat.log日志,MogHA把异常的备库重新启起来,总耗时8s。

2021-12-25 12:41:48,124 - heartbeat.loop - INFO [loop.py:50]: Detect that local instance is down, try to handle it
2021-12-25 12:41:48,132 - heartbeat.primary - ERROR [primary_heartbeat.py:224]: local primary:192.168.134.132 is not local host:192.168.134.134.
2021-12-25 12:41:48,132 - heartbeat.loop - INFO [loop.py:85]: Local instance is stopped standby
2021-12-25 12:41:48,134 - heartbeat.loop - INFO [opengauss.py:249]: VIP:192.168.134.133 already offline, local ip list: ['192.168.134.134', '192.168.134.131', '192.168.122.1']
2021-12-25 12:41:56,659 - heartbeat.loop - INFO [opengauss.py:279]: Instance start:
out:
[2021-12-25 12:41:48.163][18517][][gs_ctl]: gs_ctl started,datadir is /dbdata/data 
[2021-12-25 12:41:48.449][18517][][gs_ctl]: waiting for server to start...
[2021-12-25 12:41:56.478][18517][][gs_ctl]:  done
[2021-12-25 12:41:56.478][18517][][gs_ctl]: server started (/dbdata/data)

查看备库进程

[omm@node1 ~]$ ps -ef|grep mogdb
omm        2199      1  3 12:41 pts/1    00:00:47 /dbdata/app/mogdb/bin/mogdb -D /dbdata/data -M standby
omm        9087   2124  0 12:42 pts/1    00:00:00 grep --color=auto mogdb

备库需要执行。

# 备库执行build重新加入集群
gs_ctl -D /dbdata/data build
# 检查集群状态
[omm@node2 ~]$  gs_ctl -D /dbdata/data query
[2021-12-24 13:07:19.664][41007][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 36399
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/6389AD8
        sender_write_location          : 0/6389AD8
        sender_flush_location          : 0/6389AD8
        sender_replay_location         : 0/6389AD8
        receiver_received_location     : 0/6389AD8
        receiver_write_location        : 0/6389AD8
        receiver_flush_location        : 0/6389AD8
        receiver_replay_location       : 0/6389AD8
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.134.134:51001-->192.168.134.132:21322

 Receiver info:      
No information 

用例5:双主脑裂判断

修改MogHA配置文件node.conf。

# [v2.1新增] 备库进程未启动时,是否需要 HA 进行拉起 handle_down_standby=False # 主备同时修改,并重启MogHA

备库执行停库操作。

[omm@node2 ~]$ gs_ctl stop -D /dbdata/data
[2021-12-25 13:06:38.748][26971][][gs_ctl]: gs_ctl stopped ,datadir is /dbdata/data 
waiting for server to shut down.... done
server stopped

备库mogha_heartbeat.log日志发现不能重启备库。

2021-12-25 13:07:13,859 - heartbeat.loop - INFO [toolkit.py:180]: Ping result:{'192.168.134.132': True, '192.168.134.130': True, '192.168.134.2': True}
2021-12-25 13:07:13,879 - heartbeat.loop - INFO [loop.py:50]: Detect that local instance is down, try to handle it
2021-12-25 13:07:13,879 - heartbeat.primary - ERROR [primary_heartbeat.py:224]: local primary:192.168.134.132 is not local host:192.168.134.134.
2021-12-25 13:07:13,880 - heartbeat.loop - INFO [loop.py:85]: Local instance is stopped standby
2021-12-25 13:07:13,881 - heartbeat.loop - INFO [opengauss.py:249]: VIP:192.168.134.133 already offline, local ip list: ['192.168.134.134', '192.168.134.131', '192.168.122.1']
2021-12-25 13:07:13,881 - heartbeat.loop - ERROR [loop.py:178]: Heartbeat failed:
2021-12-25 13:07:13,881 - heartbeat.loop - ERROR [loop.py:179]: Error: local instance is down,Please check for more information, mogha do nothing because `handle_down_standby` config is set False.
Traceback (most recent call last):
  File "ha_heartbeat/loop.py", line 171, in heartbeat_loop
  File "ha_heartbeat/loop.py", line 100, in dispatch
Exception: Error: local instance is down,Please check for more information, mogha do nothing because `handle_down_standby` config is set False.

备库以Primary主库模式启动

gs_ctl start -D /dbdata/data -M primary

备库mogha_heartbeat.log日志发现双主并剔除假主库。

2021-12-25 13:08:58,917 - heartbeat.loop - INFO [loop.py:148]: Detect that local instance is active primary
2021-12-25 13:08:58,926 - heartbeat.primary - INFO [primary_heartbeat.py:37]: primary lonely check...
2021-12-25 13:08:59,710 - heartbeat.primary - INFO [primary_heartbeat.py:263]: double primary check...
2021-12-25 13:09:01,467 - heartbeat.primary - ERROR [primary_heartbeat.py:307]: other primarys found:[{'ip': '192.168.134.132', 'heartbeat_ips': ['192.168.134.130'], 'host_key': 'host1'}]
2021-12-25 13:09:10,152 - heartbeat.primary - INFO [primary_heartbeat.py:319]: double primary confirmed, start to select real primary
2021-12-25 13:09:10,170 - heartbeat.primary - INFO [primary_heartbeat.py:335]: real primary:192.168.134.132
2021-12-25 13:09:10,181 - instance - INFO [opengauss.py:249]: VIP:192.168.134.133 already offline, local ip list: ['192.168.134.134', '192.168.134.131', '192.168.122.1']
2021-12-25 13:09:20,228 - instance - INFO [opengauss.py:271]: Instance shutdown:
out:
[2021-12-25 13:09:10.215][27387][][gs_ctl]: gs_ctl stopped ,datadir is /dbdata/data 
waiting for server to shut down............. done
server stopped
, err:

2021-12-25 13:09:20,230 - heartbeat.loop - ERROR [loop.py:178]: Heartbeat failed:
2021-12-25 13:09:20,231 - heartbeat.loop - ERROR [loop.py:179]: local ip 192.168.134.134 is not real primary 192.168.134.132. shutdown instance and restart heartbeat.

备库又关闭,重新加入集群,备库需执行build。

gs_ctl -D /dbdata/data build

主备关系重新建立。

[omm@node1 ~]$ gs_ctl -D /dbdata/data/ query
[2021-12-25 13:16:09.035][34244][][gs_ctl]: gs_ctl query ,datadir is /dbdata/data 
 HA state:           
        local_role                     : Primary
        static_connections             : 1
        db_state                       : Normal
        detail_information             : Normal

 Senders info:       
        sender_pid                     : 33693
        local_role                     : Primary
        peer_role                      : Standby
        peer_state                     : Normal
        state                          : Streaming
        sender_sent_location           : 0/5FA3588
        sender_write_location          : 0/5FA3588
        sender_flush_location          : 0/5FA3588
        sender_replay_location         : 0/5FA3588
        receiver_received_location     : 0/5FA3588
        receiver_write_location        : 0/5FA3588
        receiver_flush_location        : 0/5FA3588
        receiver_replay_location       : 0/5FA3588
        sync_percent                   : 100%
        sync_state                     : Sync
        sync_priority                  : 1
        sync_most_available            : On
        channel                        : 192.168.134.132:51001-->192.168.134.134:13243

 Receiver info:      
No information 
最后修改时间:2022-01-11 20:27:10
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论