暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MHA主节点在线切换测试

原创 OpenDBA 2024-06-05
445

【环境信息】
MHA数据节点1:192.168.1.201
MHA数据节点2:192.168.1.202
MHA Manager节点:192.168.1.203
MHA集群VIP地址:192.168.1.204
MySQL数据库版本:5.7.28


【测试项目】

由于MySQL版本升级、安装操作系统补丁等情况,MHA集群master节点需要进行在线切换。


【测试结论】

1、执行切换前,需要手工关闭MHA Manager监控服务。

2、切换完成后,会自动产生新的master节点、从节点会自动建立主从复制关系、MHA集群的VIP也会自动漂移到新的master节点。整个切换过程,不需要进行任何干预动作。

3、切换完成后,需要手工开启MHA Manager监控服务。


【测试步骤】

步骤1 检查MHA集群的状态

MHA集群当前master节点是192.168.1.202。

# masterha_check_status --conf=/etc/mha/mha.cnf 
mha (pid:27040) is running(0:PING_OK), master:192.168.1.202
[root@node3 local]# masterha_check_repl --conf=/etc/mha/mha.cnf 
Wed Jun  5 17:28:22 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun  5 17:28:22 2024 - [info] Reading application default configuration from /etc/mha/mha.cnf..
Wed Jun  5 17:28:22 2024 - [info] Reading server configuration from /etc/mha/mha.cnf..
Wed Jun  5 17:28:22 2024 - [info] MHA::MasterMonitor version 0.58.
Wed Jun  5 17:28:24 2024 - [info] GTID failover mode = 1
Wed Jun  5 17:28:24 2024 - [info] Dead Servers:
Wed Jun  5 17:28:24 2024 - [info] Alive Servers:
Wed Jun  5 17:28:24 2024 - [info]   192.168.1.201(192.168.1.201:3306)
Wed Jun  5 17:28:24 2024 - [info]   192.168.1.202(192.168.1.202:3306)
Wed Jun  5 17:28:24 2024 - [info]   192.168.1.203(192.168.1.203:3306)
Wed Jun  5 17:28:24 2024 - [info] Alive Slaves:
Wed Jun  5 17:28:24 2024 - [info]   192.168.1.201(192.168.1.201:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Wed Jun  5 17:28:24 2024 - [info]     GTID ON
Wed Jun  5 17:28:24 2024 - [info]     Replicating from 192.168.1.202(192.168.1.202:3306)
Wed Jun  5 17:28:24 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jun  5 17:28:24 2024 - [info]   192.168.1.203(192.168.1.203:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Wed Jun  5 17:28:24 2024 - [info]     GTID ON
Wed Jun  5 17:28:24 2024 - [info]     Replicating from 192.168.1.202(192.168.1.202:3306)
Wed Jun  5 17:28:24 2024 - [info] Current Alive Master: 192.168.1.202(192.168.1.202:3306)
Wed Jun  5 17:28:24 2024 - [info] Checking slave configurations..
Wed Jun  5 17:28:24 2024 - [info]  read_only=1 is not set on slave 192.168.1.201(192.168.1.201:3306).
Wed Jun  5 17:28:24 2024 - [info]  read_only=1 is not set on slave 192.168.1.203(192.168.1.203:3306).
Wed Jun  5 17:28:24 2024 - [info] Checking replication filtering settings..
Wed Jun  5 17:28:24 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jun  5 17:28:24 2024 - [info]  Replication filtering check ok.
Wed Jun  5 17:28:24 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Jun  5 17:28:24 2024 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jun  5 17:28:24 2024 - [info] HealthCheck: SSH to 192.168.1.202 is reachable.
Wed Jun  5 17:28:24 2024 - [info] 
192.168.1.202(192.168.1.202:3306) (current master)
 +--192.168.1.201(192.168.1.201:3306)
 +--192.168.1.203(192.168.1.203:3306)

Wed Jun  5 17:28:24 2024 - [info] Checking replication health on 192.168.1.201..
Wed Jun  5 17:28:24 2024 - [info]  ok.
Wed Jun  5 17:28:24 2024 - [info] Checking replication health on 192.168.1.203..
Wed Jun  5 17:28:24 2024 - [info]  ok.
Wed Jun  5 17:28:24 2024 - [info] Checking master_ip_failover_script status:
Wed Jun  5 17:28:24 2024 - [info]   /usr/local/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.202 --orig_master_ip=192.168.1.202 --orig_master_port=3306 
Wed Jun  5 17:28:24 2024 - [info]  OK.
Wed Jun  5 17:28:24 2024 - [warning] shutdown_script is not defined.
Wed Jun  5 17:28:24 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.


步骤2 执行在线切换

MHA集群执行在线切换,首先需要关闭MHA Manager监控服务。否则,执行在线切换会报错:MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.

# masterha_stop --conf=/etc/mha/mha.cnf 
Stopped mha successfully.
# masterha_check_status --conf=/etc/mha/mha.cnf 
mha is stopped(2:NOT_RUNNING).
# 节点192.168.1.203
# tailf /usr/local/mha/manager.log
Wed Jun 5 17:36:32 2024 - [info] Got terminate signal. Exit.
# masterha_master_switch --conf=/etc/mha/mha.cnf --master_state=alive --new_master_host=192.168.1.201 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Wed Jun 5 17:37:45 2024 - [info] MHA::MasterRotate version 0.58. Wed Jun 5 17:37:45 2024 - [info] Starting online master switch.. Wed Jun 5 17:37:45 2024 - [info] Wed Jun 5 17:37:45 2024 - [info] * Phase 1: Configuration Check Phase.. Wed Jun 5 17:37:45 2024 - [info] Wed Jun 5 17:37:45 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jun 5 17:37:45 2024 - [info] Reading application default configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:37:45 2024 - [info] Reading server configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:37:46 2024 - [info] GTID failover mode = 1 Wed Jun 5 17:37:46 2024 - [info] Current Alive Master: 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:37:46 2024 - [info] Alive Slaves: Wed Jun 5 17:37:46 2024 - [info] 192.168.1.201(192.168.1.201:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:37:46 2024 - [info] GTID ON Wed Jun 5 17:37:46 2024 - [info] Replicating from 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:37:46 2024 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jun 5 17:37:46 2024 - [info] 192.168.1.203(192.168.1.203:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:37:46 2024 - [info] GTID ON Wed Jun 5 17:37:46 2024 - [info] Replicating from 192.168.1.202(192.168.1.202:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.202(192.168.1.202:3306)? (YES/no): YES Wed Jun 5 17:37:49 2024 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Wed Jun 5 17:37:49 2024 - [info] ok. Wed Jun 5 17:37:49 2024 - [info] Checking MHA is not monitoring or doing failover.. Wed Jun 5 17:37:49 2024 - [info] Checking replication health on 192.168.1.201.. Wed Jun 5 17:37:49 2024 - [info] ok. Wed Jun 5 17:37:49 2024 - [info] Checking replication health on 192.168.1.203.. Wed Jun 5 17:37:49 2024 - [info] ok. Wed Jun 5 17:37:49 2024 - [info] 192.168.1.201 can be new master. Wed Jun 5 17:37:49 2024 - [info] From: 192.168.1.202(192.168.1.202:3306) (current master) +--192.168.1.201(192.168.1.201:3306) +--192.168.1.203(192.168.1.203:3306) To: 192.168.1.201(192.168.1.201:3306) (new master) +--192.168.1.203(192.168.1.203:3306) +--192.168.1.202(192.168.1.202:3306) Starting master switch from 192.168.1.202(192.168.1.202:3306) to 192.168.1.201(192.168.1.201:3306)? (yes/NO): yes Wed Jun 5 17:38:00 2024 - [info] Checking whether 192.168.1.201(192.168.1.201:3306) is ok for the new master.. Wed Jun 5 17:38:00 2024 - [info] ok. Wed Jun 5 17:38:00 2024 - [info] 192.168.1.202(192.168.1.202:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Wed Jun 5 17:38:00 2024 - [info] 192.168.1.202(192.168.1.202:3306): Resetting slave pointing to the dummy host. Wed Jun 5 17:38:00 2024 - [info] ** Phase 1: Configuration Check Phase completed. Wed Jun 5 17:38:00 2024 - [info] Wed Jun 5 17:38:00 2024 - [info] * Phase 2: Rejecting updates Phase.. Wed Jun 5 17:38:00 2024 - [info] Wed Jun 5 17:38:00 2024 - [info] Executing master ip online change script to disable write on the current master: Wed Jun 5 17:38:00 2024 - [info] /usr/local/mha/scripts/master_ip_online_change --command=stop --orig_master_host=192.168.1.202 --orig_master_ip=192.168.1.202 --orig_master_port=3306 --orig_master_user='mysqladmin' --new_master_host=192.168.1.201 --new_master_ip=192.168.1.201 --new_master_port=3306 --new_master_user='mysqladmin' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Wed Jun 5 17:38:01 2024 062917 Set read_only on the new master.. ok. Wed Jun 5 17:38:01 2024 073393 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds) {'Time' => '3883','db' => undef,'Id' => '20','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.203:42778'} {'Time' => '2175','db' => undef,'Id' => '23','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.201:48226'} Wed Jun 5 17:38:01 2024 576572 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds) {'Time' => '3884','db' => undef,'Id' => '20','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.203:42778'} {'Time' => '2176','db' => undef,'Id' => '23','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.201:48226'} Wed Jun 5 17:38:02 2024 081461 Waiting all running 2 threads are disconnected.. (max 500 milliseconds) {'Time' => '3884','db' => undef,'Id' => '20','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.203:42778'} {'Time' => '2176','db' => undef,'Id' => '23','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.201:48226'} Wed Jun 5 17:38:02 2024 586154 Set read_only=1 on the orig master.. ok. Wed Jun 5 17:38:02 2024 591010 Waiting all running 2 queries are disconnected.. (max 500 milliseconds) {'Time' => '3885','db' => undef,'Id' => '20','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.203:42778'} {'Time' => '2177','db' => undef,'Id' => '23','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.1.201:48226'} Wed Jun 5 17:38:03 2024 089331 Killing all application threads.. Wed Jun 5 17:38:03 2024 103242 done. Disabling the VIP an old master: 192.168.1.202 Wed Jun 5 17:38:03 2024 - [info] ok. Wed Jun 5 17:38:03 2024 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Wed Jun 5 17:38:03 2024 - [info] Executing FLUSH TABLES WITH READ LOCK.. Wed Jun 5 17:38:03 2024 - [info] ok. Wed Jun 5 17:38:03 2024 - [info] Orig master binlog:pos is mysql-bin.000011:234. Wed Jun 5 17:38:03 2024 - [info] Waiting to execute all relay logs on 192.168.1.201(192.168.1.201:3306).. Wed Jun 5 17:38:03 2024 - [info] master_pos_wait(mysql-bin.000011:234) completed on 192.168.1.201(192.168.1.201:3306). Executed 0 events. Wed Jun 5 17:38:03 2024 - [info] done. Wed Jun 5 17:38:03 2024 - [info] Getting new master's binlog name and position.. Wed Jun 5 17:38:03 2024 - [info] mysql-bin.000021:234 Wed Jun 5 17:38:03 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Wed Jun 5 17:38:03 2024 - [info] Executing master ip online change script to allow write on the new master: Wed Jun 5 17:38:03 2024 - [info] /usr/local/mha/scripts/master_ip_online_change --command=start --orig_master_host=192.168.1.202 --orig_master_ip=192.168.1.202 --orig_master_port=3306 --orig_master_user='mysqladmin' --new_master_host=192.168.1.201 --new_master_ip=192.168.1.201 --new_master_port=3306 --new_master_user='mysqladmin' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Wed Jun 5 17:38:03 2024 727282 Set read_only=0 on the new master. Enabling the VIP 192.168.1.204/24 on the new master: 192.168.1.201 arping: 192.168.1.204/24: Name or service not known Wed Jun 5 17:38:04 2024 - [info] ok. Wed Jun 5 17:38:04 2024 - [info] Wed Jun 5 17:38:04 2024 - [info] * Switching slaves in parallel.. Wed Jun 5 17:38:04 2024 - [info] Wed Jun 5 17:38:04 2024 - [info] -- Slave switch on host 192.168.1.203(192.168.1.203:3306) started, pid: 28886 Wed Jun 5 17:38:04 2024 - [info] Wed Jun 5 17:38:05 2024 - [info] Log messages from 192.168.1.203 ... Wed Jun 5 17:38:05 2024 - [info] Wed Jun 5 17:38:04 2024 - [info] Waiting to execute all relay logs on 192.168.1.203(192.168.1.203:3306).. Wed Jun 5 17:38:04 2024 - [info] master_pos_wait(mysql-bin.000011:234) completed on 192.168.1.203(192.168.1.203:3306). Executed 0 events. Wed Jun 5 17:38:04 2024 - [info] done. Wed Jun 5 17:38:04 2024 - [info] Resetting slave 192.168.1.203(192.168.1.203:3306) and starting replication from the new master 192.168.1.201(192.168.1.201:3306).. Wed Jun 5 17:38:04 2024 - [info] Executed CHANGE MASTER. Wed Jun 5 17:38:04 2024 - [info] Slave started. Wed Jun 5 17:38:05 2024 - [info] End of log messages from 192.168.1.203 ... Wed Jun 5 17:38:05 2024 - [info] Wed Jun 5 17:38:05 2024 - [info] -- Slave switch on host 192.168.1.203(192.168.1.203:3306) succeeded. Wed Jun 5 17:38:05 2024 - [info] Unlocking all tables on the orig master: Wed Jun 5 17:38:05 2024 - [info] Executing UNLOCK TABLES.. Wed Jun 5 17:38:05 2024 - [info] ok. Wed Jun 5 17:38:05 2024 - [info] Starting orig master as a new slave.. Wed Jun 5 17:38:05 2024 - [info] Resetting slave 192.168.1.202(192.168.1.202:3306) and starting replication from the new master 192.168.1.201(192.168.1.201:3306).. Wed Jun 5 17:38:05 2024 - [info] Executed CHANGE MASTER. Wed Jun 5 17:38:05 2024 - [info] Slave started. Wed Jun 5 17:38:05 2024 - [info] All new slave servers switched successfully. Wed Jun 5 17:38:05 2024 - [info] Wed Jun 5 17:38:05 2024 - [info] * Phase 5: New master cleanup phase.. Wed Jun 5 17:38:05 2024 - [info] Wed Jun 5 17:38:05 2024 - [info] 192.168.1.201: Resetting slave info succeeded. Wed Jun 5 17:38:05 2024 - [info] Switching master to 192.168.1.201(192.168.1.201:3306) completed successfully.
# 节点192.168.1.201
# tailf /var/log/mysqld.log
2024-06-05T17:38:03.316122+08:00 4 [ERROR] Error reading packet from server for channel '': Lost connection to MySQL server during query (server_errno=2013) 2024-06-05T17:38:03.316411+08:00 4 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000011' at position 234 for channel '' 2024-06-05T17:38:03.316456+08:00 4 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2024-06-05T17:38:03.574315+08:00 5 [Note] Error reading relay log event for channel '': slave SQL thread was killed 2024-06-05T17:38:03.574397+08:00 5 [Note] Slave SQL thread for channel '' exiting, replication stopped in log 'mysql-bin.000011' at position 234 2024-06-05T17:38:04.529381+08:00 18 [Note] Start binlog_dump to master_thread_id(18) slave_server(1123), pos(, 4) 2024-06-05T17:38:04.529497+08:00 18 [Note] Start semi-sync binlog_dump to slave (server_id: 1123), pos(, 4) 2024-06-05T17:38:05.496160+08:00 4 [Note] Slave I/O thread killed while reading event for channel '' 2024-06-05T17:38:05.496381+08:00 4 [Note] Slave I/O thread exiting for channel '', read up to log 'mysql-bin.000011', position 234 2024-06-05T17:38:05.502844+08:00 19 [Note] Start binlog_dump to master_thread_id(19) slave_server(1122), pos(, 4) 2024-06-05T17:38:05.503004+08:00 19 [Note] Start semi-sync binlog_dump to slave (server_id: 1122), pos(, 4)
# 节点192.168.1.202
# tailf /var/log/mysqld.log
2024-06-05T17:38:01.231179+08:00 34 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='dummy_host', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2024-06-05T17:38:03.725883+08:00 20 [Note] Stop semi-sync binlog_dump to slave (server_id: 1123) 2024-06-05T17:38:03.727328+08:00 23 [Note] Stop semi-sync binlog_dump to slave (server_id: 1121) 2024-06-05T17:38:03.744066+08:00 37 [Note] Start binlog_dump to master_thread_id(37) slave_server(1121), pos(, 4) 2024-06-05T17:38:03.744149+08:00 37 [Note] Start semi-sync binlog_dump to slave (server_id: 1121), pos(, 4) 2024-06-05T17:38:03.813178+08:00 36 [Note] Start binlog_dump to master_thread_id(36) slave_server(1123), pos(, 4) 2024-06-05T17:38:03.813309+08:00 36 [Note] Start semi-sync binlog_dump to slave (server_id: 1123), pos(, 4) 2024-06-05T17:38:04.802374+08:00 0 [ERROR] /usr/sbin/mysqld: Got an error reading communication packets 2024-06-05T17:38:05.880650+08:00 38 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.1.201', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2024-06-05T17:38:05.898467+08:00 39 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.1.201:3306' in log 'FIRST' at position 4 2024-06-05T17:38:05.898648+08:00 39 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2024-06-05T17:38:05.903712+08:00 40 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-bin.000001' position: 4 2024-06-05T17:38:05.904090+08:00 39 [Note] Slave I/O thread for channel '': connected to master 'repl@192.168.1.201:3306',replication started in log 'FIRST' at position 4 2024-06-05T17:38:05.908351+08:00 0 [ERROR] /usr/sbin/mysqld: Got an error reading communication packets
# 节点192.168.1.203
# tailf /var/log/mysqld.log
2024-06-05T17:38:03.090557+08:00 36 [ERROR] Error reading packet from server for channel '': Lost connection to MySQL server during query (server_errno=2013) 2024-06-05T17:38:03.090638+08:00 36 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000011' at position 234 for channel '' 2024-06-05T17:38:03.090662+08:00 36 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2024-06-05T17:38:04.171873+08:00 37 [Note] Error reading relay log event for channel '': slave SQL thread was killed 2024-06-05T17:38:04.171930+08:00 37 [Note] Slave SQL thread for channel '' exiting, replication stopped in log 'mysql-bin.000011' at position 234 2024-06-05T17:38:04.177447+08:00 36 [Note] Slave I/O thread killed while reading event for channel '' 2024-06-05T17:38:04.177497+08:00 36 [Note] Slave I/O thread exiting for channel '', read up to log 'mysql-bin.000011', position 234 2024-06-05T17:38:04.256679+08:00 49 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='192.168.1.202', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.1.201', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2024-06-05T17:38:04.273096+08:00 50 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.1.201:3306' in log 'FIRST' at position 4 2024-06-05T17:38:04.273205+08:00 50 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2024-06-05T17:38:04.275772+08:00 50 [Note] Slave I/O thread for channel '': connected to master 'repl@192.168.1.201:3306',replication started in log 'FIRST' at position 4 2024-06-05T17:38:04.277801+08:00 51 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-bin.000001' position: 4
# tailf /usr/local/mha/manager.log
因为MHA Manager服务已经关闭,故无日志输出。

# 执行切换前没有关闭MHA Manager监控服务,导致切换失败
# masterha_master_switch --conf=/etc/mha/mha.cnf --master_state=alive --new_master_host=192.168.1.201 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000 Wed Jun 5 17:31:57 2024 - [info] MHA::MasterRotate version 0.58. Wed Jun 5 17:31:57 2024 - [info] Starting online master switch.. Wed Jun 5 17:31:57 2024 - [info] Wed Jun 5 17:31:57 2024 - [info] * Phase 1: Configuration Check Phase.. Wed Jun 5 17:31:57 2024 - [info] Wed Jun 5 17:31:57 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jun 5 17:31:57 2024 - [info] Reading application default configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:31:57 2024 - [info] Reading server configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:31:58 2024 - [info] GTID failover mode = 1 Wed Jun 5 17:31:58 2024 - [info] Current Alive Master: 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:31:58 2024 - [info] Alive Slaves: Wed Jun 5 17:31:58 2024 - [info] 192.168.1.201(192.168.1.201:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:31:58 2024 - [info] GTID ON Wed Jun 5 17:31:58 2024 - [info] Replicating from 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:31:58 2024 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jun 5 17:31:58 2024 - [info] 192.168.1.203(192.168.1.203:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:31:58 2024 - [info] GTID ON Wed Jun 5 17:31:58 2024 - [info] Replicating from 192.168.1.202(192.168.1.202:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.202(192.168.1.202:3306)? (YES/no): YES Wed Jun 5 17:32:07 2024 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Wed Jun 5 17:32:07 2024 - [info] ok. Wed Jun 5 17:32:07 2024 - [info] Checking MHA is not monitoring or doing failover.. Wed Jun 5 17:32:07 2024 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln143] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again. Wed Jun 5 17:32:07 2024 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.


步骤3 启动MHA Manager服务

# tailf /usr/local/mha/manager.log
Wed Jun 5 17:48:58 2024 - [info] MHA::MasterMonitor version 0.58. Wed Jun 5 17:48:59 2024 - [info] GTID failover mode = 1 Wed Jun 5 17:48:59 2024 - [info] Dead Servers: Wed Jun 5 17:48:59 2024 - [info] Alive Servers: Wed Jun 5 17:48:59 2024 - [info] 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:48:59 2024 - [info] 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:48:59 2024 - [info] 192.168.1.203(192.168.1.203:3306) Wed Jun 5 17:48:59 2024 - [info] Alive Slaves: Wed Jun 5 17:48:59 2024 - [info] 192.168.1.202(192.168.1.202:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:48:59 2024 - [info] GTID ON Wed Jun 5 17:48:59 2024 - [info] Replicating from 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:48:59 2024 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jun 5 17:48:59 2024 - [info] 192.168.1.203(192.168.1.203:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:48:59 2024 - [info] GTID ON Wed Jun 5 17:48:59 2024 - [info] Replicating from 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:48:59 2024 - [info] Current Alive Master: 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:48:59 2024 - [info] Checking slave configurations.. Wed Jun 5 17:48:59 2024 - [info] read_only=1 is not set on slave 192.168.1.203(192.168.1.203:3306). Wed Jun 5 17:48:59 2024 - [info] Checking replication filtering settings.. Wed Jun 5 17:48:59 2024 - [info] binlog_do_db= , binlog_ignore_db= Wed Jun 5 17:48:59 2024 - [info] Replication filtering check ok. Wed Jun 5 17:48:59 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Wed Jun 5 17:48:59 2024 - [info] Checking SSH publickey authentication settings on the current master.. Wed Jun 5 17:48:59 2024 - [info] HealthCheck: SSH to 192.168.1.201 is reachable. Wed Jun 5 17:48:59 2024 - [info] 192.168.1.201(192.168.1.201:3306) (current master) +--192.168.1.202(192.168.1.202:3306) +--192.168.1.203(192.168.1.203:3306) Wed Jun 5 17:48:59 2024 - [info] Checking master_ip_failover_script status: Wed Jun 5 17:48:59 2024 - [info] /usr/local/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.201 --orig_master_ip=192.168.1.201 --orig_master_port=3306 Wed Jun 5 17:48:59 2024 - [info] OK. Wed Jun 5 17:48:59 2024 - [warning] shutdown_script is not defined. Wed Jun 5 17:48:59 2024 - [info] Set master ping interval 1 seconds. Wed Jun 5 17:48:59 2024 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.1.202 -s 192.168.1.203 --user=root --master_host=192.168.1.201 --master_ip=192.168.1.201 --master_port=3306 Wed Jun 5 17:48:59 2024 - [info] Starting ping health check on 192.168.1.201(192.168.1.201:3306).. Wed Jun 5 17:48:59 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..


步骤4 检查MHA集群的状态

# masterha_check_status --conf=/etc/mha/mha.cnf 
mha (pid:28902) is running(0:PING_OK), master:192.168.1.201

# masterha_check_repl --conf=/etc/mha/mha.cnf Wed Jun 5 17:41:18 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jun 5 17:41:18 2024 - [info] Reading application default configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:41:18 2024 - [info] Reading server configuration from /etc/mha/mha.cnf.. Wed Jun 5 17:41:18 2024 - [info] MHA::MasterMonitor version 0.58. Wed Jun 5 17:41:19 2024 - [info] GTID failover mode = 1 Wed Jun 5 17:41:19 2024 - [info] Dead Servers: Wed Jun 5 17:41:19 2024 - [info] Alive Servers: Wed Jun 5 17:41:19 2024 - [info] 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:41:19 2024 - [info] 192.168.1.202(192.168.1.202:3306) Wed Jun 5 17:41:19 2024 - [info] 192.168.1.203(192.168.1.203:3306) Wed Jun 5 17:41:19 2024 - [info] Alive Slaves: Wed Jun 5 17:41:19 2024 - [info] 192.168.1.202(192.168.1.202:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:41:19 2024 - [info] GTID ON Wed Jun 5 17:41:19 2024 - [info] Replicating from 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:41:19 2024 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jun 5 17:41:19 2024 - [info] 192.168.1.203(192.168.1.203:3306) Version=5.7.28-log (oldest major version between slaves) log-bin:enabled Wed Jun 5 17:41:19 2024 - [info] GTID ON Wed Jun 5 17:41:19 2024 - [info] Replicating from 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:41:19 2024 - [info] Current Alive Master: 192.168.1.201(192.168.1.201:3306) Wed Jun 5 17:41:19 2024 - [info] Checking slave configurations.. Wed Jun 5 17:41:19 2024 - [info] read_only=1 is not set on slave 192.168.1.203(192.168.1.203:3306). Wed Jun 5 17:41:19 2024 - [info] Checking replication filtering settings.. Wed Jun 5 17:41:19 2024 - [info] binlog_do_db= , binlog_ignore_db= Wed Jun 5 17:41:19 2024 - [info] Replication filtering check ok. Wed Jun 5 17:41:19 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Wed Jun 5 17:41:19 2024 - [info] Checking SSH publickey authentication settings on the current master.. Wed Jun 5 17:41:20 2024 - [info] HealthCheck: SSH to 192.168.1.201 is reachable. Wed Jun 5 17:41:20 2024 - [info] 192.168.1.201(192.168.1.201:3306) (current master) +--192.168.1.202(192.168.1.202:3306) +--192.168.1.203(192.168.1.203:3306) Wed Jun 5 17:41:20 2024 - [info] Checking replication health on 192.168.1.202.. Wed Jun 5 17:41:20 2024 - [info] ok. Wed Jun 5 17:41:20 2024 - [info] Checking replication health on 192.168.1.203.. Wed Jun 5 17:41:20 2024 - [info] ok. Wed Jun 5 17:41:20 2024 - [info] Checking master_ip_failover_script status: Wed Jun 5 17:41:20 2024 - [info] /usr/local/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.201 --orig_master_ip=192.168.1.201 --orig_master_port=3306 Wed Jun 5 17:41:20 2024 - [info] OK. Wed Jun 5 17:41:20 2024 - [warning] shutdown_script is not defined. Wed Jun 5 17:41:20 2024 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.


检查master节点的VIP

# ip addr |grep enp0s3
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.1.201/24 brd 192.168.1.255 scope global noprefixroute enp0s3
    inet 192.168.1.204/24 brd 192.168.1.255 scope global secondary enp0s3:1
最后修改时间:2024-06-06 14:25:58
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论