验证3个节点全部关闭后恢复集群
1 节点1 关闭组复制
stop group_replication;
节点2 检查集群状态;
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 47b5b3dc-c5ec-11ec-aaae-005056acf73f | puma_db2 | 3307 | ONLINE | SECONDARY | 8.0.28 | XCom |
| group_replication_applier | 498531b3-c5ec-11ec-a327-005056acab8b | puma_db3 | 3307 | ONLINE | PRIMARY | 8.0.28 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)
节点3为primary
在节点3执行数据插入操作
use test;
INSERT INTO t1 VALUES (4, 'wzf0072');
select * from t1;
关闭节点3 组复制
stop group_replication;
节点3 组复制状态
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 498531b3-c5ec-11ec-a327-005056acab8b | puma_db3 | 3307 | OFFLINE | | | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)
mysql>
在节点2执行数据插入操作(此时集群中只有节点2)
use test;
INSERT INTO t1 VALUES (5, 'wzf');
select * from t1;
关闭节点2组复制
stop group_replication;
此时集群中所有节点全部离线
检查vip状态 mysql> system ip add
VIP:172.16.130.183 已经离线
mysql> system ping 172.16.130.183
PING 172.16.130.183 (172.16.130.183) 56(84) bytes of data.
From 172.16.130.85 icmp_seq=10 Destination Host Unreachable
找到最后一个关闭的节点( Executed_Gtid_Set)
show master status;
node1:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-13:1000006:2000006
node2:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-14:1000006-1000007:2000006
node3:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-14:1000006:2000006
结论:最后一个关闭的为节点2
关闭节点1组复制;
主节点:节点3,主节点插入数据INSERT INTO t1 VALUES (4, 'wzf0072');
关闭节点3组复制;
主节点:节点2,主节点插入数据INSERT INTO t1 VALUES (5, 'wzf');
关闭节点2组复制;
检查Executed_Gtid_Set信息:show master status;
node1:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-13:1000006:2000006
node2:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-14:1000006-1000007:2000006
node3:e6670473-4a7c-4030-b99c-83e57ffeaeb6:1-14:1000006:2000006
3节点离线后故障恢复
节点2
直接启动组复制 报错
mysql> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
正确启动方式:SET GLOBAL group_replication_bootstrap_group=ON; START GROUP_REPLICATION;SET GLOBAL group_replication_bootstrap_group=OFF;
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (1.16 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 47b5b3dc-c5ec-11ec-aaae-005056acf73f | puma_db2 | 3307 | ONLINE | PRIMARY | 8.0.28 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)
mysql> system ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:ac:f7:3f brd ff:ff:ff:ff:ff:ff
inet 172.16.130.84/24 brd 172.16.130.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet 172.16.130.183/24 brd 172.16.130.255 scope global secondary ens160:3307
valid_lft forever preferred_lft forever
inet6 fe80::aa1e:48d6:f43f:5c52/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::8486:2df9:aa50:c675/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::bef0:a745:9a02:df2d/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
mysql>
继续启动节点1和节点3 (建议从先从gtid大的启动,数据同步完成后再启动gtid小的,防止网络抖动导致集群故障)
mysql> start group_replication;
在线切换主库
SELECT group_replication_set_as_primary('member_uuid');
# 关闭 primary 节点的 mysql 服务,模拟服务故障,验证该节点被驱除出集群后 vip 自动切换能力和业 务自愈能力
3.2 primary 节点系统故障
# 执行 reboot 操作,模拟系统故障后,vip 是否自动切换,服务启动后,节点重新加入集群是否正常
3.3 secondary 节点故障
同上,模拟 secondary 节点系统故障和 mysql 服务故障,及节点是否可以正常加入集群
四、逃生方案
4.1 背景
为了避免网络异常造成集群不可用,或者重大故障导致集群异常时,DBA 能够快速回复业务的可用 性。
4.2 尝试重组 MGR
启动 mgr 实例,检查 3 个节点的 gtid 集合。
<1> 选择 gtid 最大的节点,执行下面操作
set global group_replication_bootstrap_group=on; start group_replication;
set global group_replication_bootstrap_group=off;
select * from performance_schema.replication_group_members;
<2> 另外两个节点执行以下操作
start group_replication;
select * from performance_schema.replication_group_members;
4.3 单实例模式运行
重组 mgr 失败后,选择 GTID 集最大的节点,关闭只读模式,手动挂载 VIP,然后提供服务。
#手动挂载 vip
ip addr add 172.16.130.183/24 dev ens160
#手动删除 vip
ip addr del 172.16.130.183/24 dev ens160
4.3 使用 clone plugin 进行节点恢复
专项实验
五、常见的报错
1、权限问题 2、/etc/hosts 配置错误
3、修改 hostname 导致通信异常
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




