Whoami:5年+金融、政府、医疗领域工作经验的DBACertificate:OCP、PCPSkill:Oracle、Mysql、PostgreSQLPlatform:CSDN、墨天伦、公众号(呆呆的私房菜)
阅读本文可以了解到MySQL MHA高可用架构和详细的部署实施、故障切换和主备切换等内容。

MHA是目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。
在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
MHA软件由MHA Manager和MHA Node两部分组成。Manager节点单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。Node节点部署在MySQL服务器上。
整个故障转移过程对应用程序完全透明。
1. 从宕机崩溃的master保存binlog events;
2. 识别含有最新更新的slave;
3. 应用差异的relay log到其他的slave;
4. 应用从master保存的binlog events;
5. 提升一个slave为新的master;
6. 将其它的slave连接到新的master进行复制。

MHA软件由MHA Manager和Node节点组成,Manager可以单独部署在一台服务器上管理多个master-slave集群,也可以部署在某个master-slave集群中的slave节点上。
Manager工具包:
https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
| 组件名称 | 组件说明 |
| masterha_check_ssh | 检查MHA的ssh配置状况 |
| masterha_check_repl | 检查MySQL复制状况 |
| masterha_manager | 启动MHA |
| masteha_check_status | 检查当前MHA运行状态 |
| masterha_master_monitor | 检测master是否宕机 |
| masterha_master_switch | 控制故障转移(自动或手动) |
| masterha_conf_host | 添加或删除配置的server信息 |
Node工具包:
https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
| 组件名称 | 组件说明 |
| save_binary_log | 保存和复制master的二进制日志 |
| apply_diff_relay_logs | 识别差异的中继日志事件并将其差异的事件应用于其他的slave |
| filter_mysqlbinlog | 去除不必要的rollback事件 |
| purge_relay_logs | 清理中继日志(不阻塞SQL线程) |
1. 环境说明
| 主机名 | IP地址 | 操作系统 | 角色 |
| host-01 | 10.28.12.21 | CentOS 7.6 | MHA Node、MySQL Maste |
| host-02 | 10.28.12.22 | CentOS 7.6 | MHA Node、MySQL Slave |
| host-03 | 10.28.12.23 | CentOS 7.6 | MHA Manager、MHA Node、MySQL Slave |
| 10.28.12.99 | VIP(业务IP) |
2. 数据库安装
## 创建数据库安装用户useradd mysql8022mkdir -p mysql/mysql8022chown -R mysql8022.mysql8022 mysql/mysql8022chmod 750 -R mysql/mysql8022## 安装包上传到/mysql/mysql8022下cd mysql/mysql8022/xz -d mysql-8.0.22-linux-glibc2.12-x86_64.tar.xztar xf mysql-8.0.22-linux-glibc2.12-x86_64.tarmv mysql-8.0.22-linux-glibc2.12-x86_64 mysql-8.0.22vi home/mysql8022/.bashrcexport PATH=$PATH:/mysql/mysql8022/mysql-8.0.22/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mysql/mysql8022/mysql-8.0.22/lib## 创建数据库相关目录su - mysql8022mkdir -p mysql/mysql8022/{binlog,relaylog,log,redo,data,temp,conf}## 编辑配置文件vi mysql/mysql8022/conf/my.cnf[client]port = 3306socket = mysql/mysql8022/temp/mysql.sockdefault-character-set = utf8mb4[mysqld]#baseuser = mysql8022port = 3306basedir = home/mysql8022/soft/mysql-8.0.22datadir = mysql/mysql8022/datatmpdir = mysql/mysql8022/tempsocket = mysql/mysql8022/temp/mysql.sockpid-file = mysql/mysql8022/temp/mysql.pid#Open files limitopen_files_limit = 65535#Connectionmax_connections = 3000max_user_connections = 1000max_connect_errors = 100wait_timeout = 600interactive_timeout = 600thread_cache_size = 2560back_log = 1024#Charactercharacter-set-server = utf8mb4#isolationtransaction-isolation = READ-COMMITTED#Timedefault_time_zone = '+8:00'log_timestamps = SYSTEM#Binlogserver-id = 21log-bin = mysql/mysql8022/binlog/mysql-binbinlog_cache_size = 128mmax_binlog_cache_size = 6Gmax_binlog_size = 1Gbinlog_error_action = ABORT_SERVER#sql modesql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION#gtidgtid_mode = onenforce_gtid_consistency = 1#InnoDBdefault-storage-engine = InnoDBinnodb_data_home_dir = mysql/mysql8022/datainnodb_data_file_path = ibdata1:500M:autoextendinnodb_temp_data_file_path = ibtmp1:200M:autoextendinnodb_file_per_table = 1innodb_log_group_home_dir = mysql/mysql8022/redoinnodb_log_files_in_group = 3innodb_buffer_pool_instances = 4innodb_flush_log_at_trx_commit = 1innodb_max_dirty_pages_pct = 75innodb_open_files = 10000#Relay logrelay-log-recovery = 1relay-log = mysql/mysql8022/relaylog/relay-binrelay-log-index = mysql/mysql8022/relaylog/relay-bin.indexrelay_log_purge = 0#Sync logsync_binlog = 1sync_relay_log = 1sync_relay_log_info = 1#Tablelower_case_table_names = 1#Table cachetable_open_cache = 2048table_definition_cache = 1024table_open_cache_instances = 32#Locklock_wait_timeout = 3600explicit_defaults_for_timestamp = 1#Cache bufferkey_buffer_size = 32Mkey_cache_block_size = 16384join_buffer_size = 8Msort_buffer_size = 8Mread_buffer_size = 8Mread_rnd_buffer_size = 4Mbulk_insert_buffer_size = 64Mmyisam_sort_buffer_size = 128Mmyisam_max_sort_file_size = 2Gtmp_table_size = 32Mmax_heap_table_size = 32Mmax_allowed_packet = 32M#Logslog_error = mysql/mysql8022/log/mysql-error.logslow_query_log = 1slow_query_log_file = mysql/mysql8022/log/slow_statement.loglong_query_time = 3log_queries_not_using_indexes = 1log_timestamps = SYSTEMlog_throttle_queries_not_using_indexes = 60general_log = 0general_log_file = mysql/mysql8022/log/general_statement.log[mysqld_safe]log-error = mysql/mysql8022/log/mysql-error.logpid-file = mysql/mysql8022/temp/mysql.pid[mysqldump]quick = onmax_allowed_packet = 32M## 注意:默认初始化的表空间是12M,innodb-data-file-path=ibdata1:12M:autoextend## 默认的表名存储方式是0 --lower-case-table-names=0## 初始化数据库mysqld --initialize-insecure --basedir=/home/mysql8022/soft/mysql-8.0.22 --datadir=/mysql/mysql8022/data --user=mysql8022 --lower-case-table-names=1 --innodb-data-file-path=ibdata1:500M:autoextend## 启动数据库mysqld_safe --defaults-file=/mysql/mysql8022/conf/my.cnf --user=mysql8022 &## 修改用户密码mysql -uroot -p -S/mysql/mysql8022/temp/mysql.sockalter user root@'localhost' identified by 'root';flush privileges;## 登录数据库mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock
3. 主从同步搭建
# 创建同步账号(所有数据库)mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sockcreate user 'repl'@'%' identified with 'mysql_native_password' by 'repl';grant replication slave, replication_slave_admin on *.* to 'repl'@'%';flush privileges;# 创建mha监控用户monitor(所有数据库)create user 'monitor'@'%' identified with 'mysql_native_password' by 'monitor';grant all privileges on *.* to 'monitor'@'%';flush privileges;# 重置binlog(所有数据库)reset master;# 从库执行mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sockchange master tomaster_host="10.28.12.21",master_user="repl",master_password="repl",master_auto_position=1;start slave;show slave status\G;# 配置从库只读set global read_only=on
4. 安装MHA
# 在线安装依赖curl -o etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repocurl -o etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repoyum clean all && yum repolistyum -y install perl-Config-Tiny perl-Email-Date-Format perl-Log-Dispatch perl-Mail-Sender perl-Mail-Sendmail perl-MIME-Lite perl-MIME-Types perl-Parallel-ForkManager perl-Params-Validate perl-DBI perl-DBD-MySQL# 离线安装依赖# 上传依赖包到/opt/soft下perl-Config-Tiny-2.14-7.el7.src.rpmperl-Email-Date-Format-1.002-15.el7.noarch.rpmperl-Log-Dispatch-2.41-1.el7.1.noarch.rpmperl-Mail-Sender-0.8.23-1.el7.noarch.rpmperl-Mail-Sendmail-0.79-21.el7.noarch.rpmperl-MIME-Lite-3.030-1.el7.noarch.rpmperl-MIME-Types-1.38-2.el7.noarch.rpmperl-Parallel-ForkManager-1.18-2.el7.noarch.rpmperl-Params-Validate-1.08-4.el7.x86_64.rpm# 安装依赖cd opt/soft/yum -y install *.rpm# 上传mha软件到/opt/mha下mha4mysql-manager-0.58-0.el7.centos.noarch.rpmmha4mysql-node-0.58-0.el7.centos.noarch.rpm# 安装mha(安装manager,需要先装node,不然会缺少依赖)cd opt/mharpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpmrpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
5. 配置MHA
# 配置/etc/hosts文件(所有节点)vi etc/hosts10.28.12.21 host-0110.28.12.22 host-0210.28.12.23 host-03# 安装用户(所有节点)su - rootuseradd mhaecho "mha" | passwd --stdin mha# 这里是为了能够访问binlog文件usermod -a -G mysql8022 mhachmod -R 750 mysql/mysql8022# 配置环境变量vi home/mha/.bashrcexport PATH=$PATH:/mysql/mysql8022/mysql-8.0.22/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mysql/mysql8022/mysql-8.0.22/lib# 配置用户sudo权限(所有节点)visudomha ALL=(ALL) NOPASSWD:ALL# 配置互信(所有节点)su - mhassh-keygen -t rsassh-copy-id host-01ssh-copy-id host-02ssh-copy-id host-03# 验证互信ssh host-01 datessh host-02 datessh host-03 date# 创建相关目录(mha node节点)mkdir -p home/mha/masterha/{script,relaylogs_purge,work}# 配置Slave节点relaylog自动清理脚本(两个slave节点)vi home/mha/masterha/script/purge_relay_log.sh#!/bin/bashuser=monitorpasswd=monitorport=3306log_dir='/home/mha/masterha/relaylogs_purge'work_dir='/home/mha/masterha/work'if [ ! -d $log_dir ]thenmkdir -p $log_dirfi/usr/bin/purge_relay_logs --user=$user --password=$passwd --port=$port --workdir=$work_dir --disable_relay_log_purge >> $log_dir/purge_relay_logs.log 2>&1# 配置定时任务chmod +x home/mha/masterha/script/purge_relay_log.shcrontab -e30 23 * * * sh home/mha/masterha/script/purge_relay_log.sh# 配置mha配置文件(mater配置)mkdir -p home/mha/masterha/confmkdir -p home/mha/masterha/logvi home/mha/masterha/conf/mha.conf[server default]user=monitorpassword=monitorrepl_user=replrepl_password=replssh_user=mhaping_interval=1master_binlog_dir=/mysql/mysql8022/binlogmanager_workdir=/home/mha/masterha/workmanager_log=/home/mha/masterha/log/manager.logmaster_ip_failover_script=/home/mha/masterha/script/master_ip_failovermaster_ip_online_change_script=/home/mha/masterha/script/master_ip_online_changesecondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306report_script=/home/mha/masterha/script/send_reportremote_workdir=/home/mha/masterha/work[server1]hostname=host-01port=3306[server2]hostname=host-02port=3306candidate_master=1check_repl_delay=0[server3]hostname=host-03port=3306# 配置vip(配置在mysql master节点上)/sbin/ifconfig eth0:1 10.28.12.99/24# 检查vip添加是否成功ip a | grep eth0# 配置master_ip_failover脚本vi home/mha/masterha/script/master_ip_failover#!/usr/bin/env perluse strict;use warnings FATAL => 'all';use Getopt::Long;my ($command, $ssh_user, $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip, $new_master_port);#############################添加内容部分#########################################my $vip = '10.28.12.99';my $brdc = '10.28.12.255';my $ifdev = 'eth0';my $key = '1';my $ssh_start_vip = "sudo usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;";my $ssh_stop_vip = "sudo usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";##################################################################################GetOptions('command=s' => \$command,'ssh_user=s' => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port,);exit &main();sub main {print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh" ) {my $exit_code = 1;eval {print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {my $exit_code = 10;eval {print "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {print "Checking the Status of the script.. OK \n";exit 0;}else {&usage();exit 1;}}sub start_vip() {`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;}# A simple system call that disable the VIP on the old_mastersub stop_vip() {`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;}sub usage {"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";}# 配置master_ip_online_change脚本vi /home/mha/masterha/script/master_ip_online_change#!/usr/bin/env perluse strict;use warnings FATAL => 'all';use Getopt::Long;use MHA::DBHelper;use MHA::NodeUtil;use Time::HiRes qw( sleep gettimeofday tv_interval );use Data::Dumper;my $_tstart;my $_running_interval = 0.1;my ($command,$orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user,$new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user,);my $vip = '10.28.12.99';my $brdc = '10.28.12.255';my $ifdev = 'eth0';my $key = '1';my $ssh_start_vip = "sudo /usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;";my $ssh_stop_vip = "sudo /usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";GetOptions('command=s' => \$command,'orig_master_is_new_slave' => \$orig_master_is_new_slave,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'orig_master_user=s' => \$orig_master_user,'orig_master_password=s' => \$orig_master_password,'orig_master_ssh_user=s' => \$orig_master_ssh_user,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port,'new_master_user=s' => \$new_master_user,'new_master_password=s' => \$new_master_password,'new_master_ssh_user=s' => \$new_master_ssh_user,);exit &main();sub current_time_us {my ( $sec, $microsec ) = gettimeofday();my $curdate = localtime($sec);return $curdate . " " . sprintf( "%06d", $microsec );}sub sleep_until {my $elapsed = tv_interval($_tstart);if ( $_running_interval > $elapsed ) {sleep( $_running_interval - $elapsed );}}sub get_threads_util {my $dbh = shift;my $my_connection_id = shift;my $running_time_threshold = shift;my $type = shift;$running_time_threshold = 0 unless ($running_time_threshold);$type = 0 unless ($type);my @threads;my $sth = $dbh->prepare("SHOW PROCESSLIST");$sth->execute();while ( my $ref = $sth->fetchrow_hashref() ) {my $id = $ref->{Id};my $user = $ref->{User};my $host = $ref->{Host};my $command = $ref->{Command};my $state = $ref->{State};my $query_time = $ref->{Time};my $info = $ref->{Info};$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);next if ( $my_connection_id == $id );next if ( defined($query_time) && $query_time < $running_time_threshold );next if ( defined($command) && $command eq "Binlog Dump" );next if ( defined($user) && $user eq "system user" );nextif ( defined($command)&& $command eq "Sleep"&& defined($query_time)&& $query_time >= 1 );if ( $type >= 1 ) {next if ( defined($command) && $command eq "Sleep" );next if ( defined($command) && $command eq "Connect" );}if ( $type >= 2 ) {next if ( defined($info) && $info =~ m/^select/i );next if ( defined($info) && $info =~ m/^show/i );}push @threads, $ref;}return @threads;}sub main {if ( $command eq "stop" ) {## Gracefully killing connections on the current master# 1. Set read_only= 1 on the new master# 2. DROP USER so that no app user can establish new connections# 3. Set read_only= 1 on the current master# 4. Kill current queries# * Any database access failure will result in script die.my $exit_code = 1;eval {## Setting read_only=1 on the new master (to avoid accident)my $new_master_handler = new MHA::DBHelper();# args: hostname, port, user, password, raise_error(die_on_error)_or_not$new_master_handler->connect( $new_master_ip, $new_master_port,$new_master_user, $new_master_password, 1 );print current_time_us() . " Set read_only on the new master.. ";$new_master_handler->enable_read_only();if ( $new_master_handler->is_read_only() ) {print "ok.\n";}else {die "Failed!\n";}$new_master_handler->disconnect();# Connecting to the orig master, die if any database error happensmy $orig_master_handler = new MHA::DBHelper();$orig_master_handler->connect( $orig_master_ip, $orig_master_port,$orig_master_user, $orig_master_password, 1 );## Drop application user so that nobody can connect. Disabling per-session binlog beforehand#$orig_master_handler->disable_log_bin_local();#print current_time_us() . " Drpping app user on the orig master..\n";#FIXME_xxx_drop_app_user($orig_master_handler);## Waiting for N * 100 milliseconds so that current connections can exitmy $time_until_read_only = 15;$_tstart = [gettimeofday];my @threads = get_threads_util( $orig_master_handler->{dbh},$orig_master_handler->{connection_id} );while ( $time_until_read_only > 0 && $#threads >= 0 ) {if ( $time_until_read_only % 5 == 0 ) {printf"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",current_time_us(), $#threads + 1, $time_until_read_only * 100;if ( $#threads < 5 ) {print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"foreach (@threads);}}sleep_until();$_tstart = [gettimeofday];$time_until_read_only--;@threads = get_threads_util( $orig_master_handler->{dbh},$orig_master_handler->{connection_id} );}## Setting read_only=1 on the current master so that nobody(except SUPER) can writeprint current_time_us() . " Set read_only=1 on the orig master.. ";$orig_master_handler->enable_read_only();if ( $orig_master_handler->is_read_only() ) {print "ok.\n";}else {die "Failed!\n";}## Waiting for M * 100 milliseconds so that current update queries can completemy $time_until_kill_threads = 5;@threads = get_threads_util( $orig_master_handler->{dbh},$orig_master_handler->{connection_id} );while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {if ( $time_until_kill_threads % 5 == 0 ) {printf"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",current_time_us(), $#threads + 1, $time_until_kill_threads * 100;if ( $#threads < 5 ) {print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"foreach (@threads);}}sleep_until();$_tstart = [gettimeofday];$time_until_kill_threads--;@threads = get_threads_util( $orig_master_handler->{dbh},$orig_master_handler->{connection_id} );}print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();## Terminating all threadsprint current_time_us() . " Killing all application threads..\n";$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );print current_time_us() . " done.\n";#$orig_master_handler->enable_log_bin_local();$orig_master_handler->disconnect();## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {## Activating master ip on the new master# 1. Create app user with write privileges# 2. Moving backup script if needed# 3. Register new master's ip to the catalog database# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.# If exit code is 0 or 10, MHA does not abortmy $exit_code = 10;eval {my $new_master_handler = new MHA::DBHelper();# args: hostname, port, user, password, raise_error_or_not$new_master_handler->connect( $new_master_ip, $new_master_port,$new_master_user, $new_master_password, 1 );## Set read_only=0 on the new master#$new_master_handler->disable_log_bin_local();print current_time_us() . " Set read_only=0 on the new master.\n";$new_master_handler->disable_read_only();## Creating an app user on the new master#print current_time_us() . " Creating app user on the new master..\n";#FIXME_xxx_create_app_user($new_master_handler);#$new_master_handler->enable_log_bin_local();$new_master_handler->disconnect();## Update master ip on the catalog database, etcprint "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {# do nothingexit 0;}else {&usage();exit 1;}}# A simple system call that enable the VIP on the new mastersub start_vip() {`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;}# A simple system call that disable the VIP on the old_mastersub stop_vip() {`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;}sub usage {"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";die;}# 编辑发送报告模板(发送信息到系统用户mail下)vi /home/mha/masterha/script/send_report#!/usr/bin/perluse strict;use warnings FATAL => 'all';use Getopt::Long;#new_master_host and new_slave_hosts are set only when recovering master succeededmy ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );GetOptions('orig_master_host=s' => \$dead_master_host,'new_master_host=s' => \$new_master_host,'new_slave_hosts=s' => \$new_slave_hosts,'subject=s' => \$subject,'body=s' => \$body,);my $log_file = '/var/spool/mail/mha';my $current_date = `date '+%Y-%m-%d %H:%M:%S'`;chomp $current_date;my $log_message = "$current_date - $subject: $body";open my $fh, '>>', $log_file or die "Cannot open $log_file: $!";print $fh $log_message . "\n";close $fh;exit 0;# 授权脚本权限chmod 775 /home/mha/masterha/script/master_ip_failover /home/mha/masterha/script/master_ip_online_change /home/mha/masterha/script/send_report# 检查ssh权限(主节点)masterha_check_ssh --conf=/home/mha/masterha/conf/mha.conf# 检查复制集群状态(主节点)masterha_check_repl --conf=/home/mha/masterha/conf/mha.conf# 配置mha启动脚本vi /home/mha/masterha/script/mha_start.shnohup masterha_manager --conf=/home/mha/masterha/conf/mha.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /home/mha/masterha/log/manager.log 2>&1 &chmod +x /home/mha/masterha/script/mha_start.sh# 配置查看MHA manager状态脚本vi /home/mha/masterha/script/mha_status.shmasterha_check_status --conf=/home/mha/masterha/conf/mha.confchmod +x /home/mha/masterha/script/mha_status.sh# 配置mha关闭脚本vi /home/mha/masterha/script/mha_stop.shmasterha_stop --conf=/home/mha/masterha/conf/mha.confchmod +x /home/mha/masterha/script/mha_stop.sh
1. 启动mha服务sh /home/mha/masterha/script/mha_start.sh# 启动日志如下Sun Jul 28 22:21:08 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Jul 28 22:21:08 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 22:21:08 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 22:21:08 2024 - [info] MHA::MasterMonitor version 0.58.Sun Jul 28 22:21:09 2024 - [info] GTID failover mode = 0Sun Jul 28 22:21:09 2024 - [info] Dead Servers:Sun Jul 28 22:21:09 2024 - [info] Alive Servers:Sun Jul 28 22:21:09 2024 - [info] host-01(10.28.12.21:3306)Sun Jul 28 22:21:09 2024 - [info] host-02(10.28.12.22:3306)Sun Jul 28 22:21:09 2024 - [info] host-03(10.28.12.23:3306)Sun Jul 28 22:21:09 2024 - [info] Alive Slaves:Sun Jul 28 22:21:09 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:21:09 2024 - [info] GTID ONSun Jul 28 22:21:09 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:21:09 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:21:09 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:21:09 2024 - [info] GTID ONSun Jul 28 22:21:09 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:21:09 2024 - [info] Current Alive Master: host-01(10.28.12.21:3306)Sun Jul 28 22:21:09 2024 - [info] Checking slave configurations..Sun Jul 28 22:21:09 2024 - [info] read_only=1 is not set on slave host-02(10.28.12.22:3306).Sun Jul 28 22:21:09 2024 - [info] read_only=1 is not set on slave host-03(10.28.12.23:3306).Sun Jul 28 22:21:09 2024 - [info] Checking replication filtering settings..Sun Jul 28 22:21:09 2024 - [info] binlog_do_db= , binlog_ignore_db=Sun Jul 28 22:21:09 2024 - [info] Replication filtering check ok.Sun Jul 28 22:21:09 2024 - [info] GTID (with auto-pos) is not supportedSun Jul 28 22:21:09 2024 - [info] Starting SSH connection tests..Sun Jul 28 22:21:12 2024 - [info] All SSH connection tests passed successfully.Sun Jul 28 22:21:12 2024 - [info] Checking MHA Node version..Sun Jul 28 22:21:12 2024 - [info] Version check ok.Sun Jul 28 22:21:12 2024 - [info] Checking SSH publickey authentication settings on the current master..Sun Jul 28 22:21:13 2024 - [info] HealthCheck: SSH to host-01 is reachable.Sun Jul 28 22:21:13 2024 - [info] Checking recovery script configurations on host-01(10.28.12.21:3306)..Sun Jul 28 22:21:13 2024 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/mysql8022/binlog --output_file=/home/mha/masterha/work/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000001Sun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.21(host-01:22)..Creating /home/mha/masterha/work if not exists.. ok.Checking output directory is accessible or not..ok.Binlog found at /mysql/mysql8022/binlog, up to mysql-bin.000001Sun Jul 28 22:21:13 2024 - [info] Binlog setting check done.Sun Jul 28 22:21:13 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..Sun Jul 28 22:21:13 2024 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=host-02 --slave_ip=10.28.12.22 --slave_port=3306 --workdir=/home/mha/masterha/work --target_version=8.0.22 --manager_version=0.58 --relay_dir=/mysql/mysql8022/relaylog --current_relay_log=relay-bin.000002 --slave_pass=xxxSun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.22(host-02:22)..Checking slave recovery environment settings..Relay log found at /mysql/mysql8022/relaylog, up to relay-bin.000002Temporary relay log file is /mysql/mysql8022/relaylog/relay-bin.000002Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.done.Testing mysqlbinlog output.. done.Cleaning up test file(s).. done.Sun Jul 28 22:21:13 2024 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=host-03 --slave_ip=10.28.12.23 --slave_port=3306 --workdir=/home/mha/masterha/work --target_version=8.0.22 --manager_version=0.58 --relay_dir=/mysql/mysql8022/relaylog --current_relay_log=relay-bin.000002 --slave_pass=xxxSun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.23(host-03:22)..Checking slave recovery environment settings..Relay log found at /mysql/mysql8022/relaylog, up to relay-bin.000002Temporary relay log file is /mysql/mysql8022/relaylog/relay-bin.000002Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.done.Testing mysqlbinlog output.. done.Cleaning up test file(s).. done.Sun Jul 28 22:21:14 2024 - [info] Slaves settings check done.Sun Jul 28 22:21:14 2024 - [info]host-01(10.28.12.21:3306) (current master)+--host-02(10.28.12.22:3306)+--host-03(10.28.12.23:3306)Sun Jul 28 22:21:14 2024 - [info] Checking master_ip_failover_script status:Sun Jul 28 22:21:14 2024 - [info] /home/mha/masterha/script/master_ip_failover --command=status --ssh_user=mha --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306IN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===Checking the Status of the script.. OKSun Jul 28 22:21:14 2024 - [info] OK.Sun Jul 28 22:21:14 2024 - [warning] shutdown_script is not defined.Sun Jul 28 22:21:14 2024 - [info] Set master ping interval 1 seconds.Sun Jul 28 22:21:14 2024 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306Sun Jul 28 22:21:14 2024 - [info] Starting ping health check on host-01(10.28.12.21:3306)..Sun Jul 28 22:21:14 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..2. 模拟master节点故障mysqladmin -uroot -proot -S/mysql/mysql8022/temp/mysql.sock shutdown# 观察到MHA已经实现主备自动切换# 业务IP从原master节点切换到新的master节点上# 原slave节点同步指向了新的master节点上Sun Jul 28 22:33:55 2024 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)Sun Jul 28 22:33:55 2024 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306 --master_user=monitor --master_password=monitor --ping_type=SELECTSun Jul 28 22:33:55 2024 - [info] Executing SSH check script: exit 0Sun Jul 28 22:33:55 2024 - [info] HealthCheck: SSH to host-01 is reachable.Monitoring server host-02 is reachable, Master is not reachable from host-02. OK.Monitoring server host-03 is reachable, Master is not reachable from host-03. OK.Sun Jul 28 22:33:56 2024 - [info] Master is not reachable from all other monitoring servers. Failover should start.Sun Jul 28 22:33:56 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))Sun Jul 28 22:33:56 2024 - [warning] Connection failed 2 time(s)..Sun Jul 28 22:33:57 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))Sun Jul 28 22:33:57 2024 - [warning] Connection failed 3 time(s)..Sun Jul 28 22:33:58 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))Sun Jul 28 22:33:58 2024 - [warning] Connection failed 4 time(s)..Sun Jul 28 22:33:58 2024 - [warning] Master is not reachable from health checker!Sun Jul 28 22:33:58 2024 - [warning] Master host-01(10.28.12.21:3306) is not reachable!Sun Jul 28 22:33:58 2024 - [warning] SSH is reachable.Sun Jul 28 22:33:58 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /home/mha/masterha/conf/mha.conf again, and trying to connect to all servers to check server status..Sun Jul 28 22:33:58 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Jul 28 22:33:58 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 22:33:58 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 22:34:00 2024 - [info] GTID failover mode = 1Sun Jul 28 22:34:00 2024 - [info] Dead Servers:Sun Jul 28 22:34:00 2024 - [info] host-01(10.28.12.21:3306)Sun Jul 28 22:34:00 2024 - [info] Alive Servers:Sun Jul 28 22:34:00 2024 - [info] host-02(10.28.12.22:3306)Sun Jul 28 22:34:00 2024 - [info] host-03(10.28.12.23:3306)Sun Jul 28 22:34:00 2024 - [info] Alive Slaves:Sun Jul 28 22:34:00 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:00 2024 - [info] GTID ONSun Jul 28 22:34:00 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:00 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:34:00 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:00 2024 - [info] GTID ONSun Jul 28 22:34:00 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:00 2024 - [info] Checking slave configurations..Sun Jul 28 22:34:00 2024 - [info] read_only=1 is not set on slave host-02(10.28.12.22:3306).Sun Jul 28 22:34:00 2024 - [info] read_only=1 is not set on slave host-03(10.28.12.23:3306).Sun Jul 28 22:34:00 2024 - [info] Checking replication filtering settings..Sun Jul 28 22:34:00 2024 - [info] Replication filtering check ok.Sun Jul 28 22:34:00 2024 - [info] Master is down!Sun Jul 28 22:34:00 2024 - [info] Terminating monitoring script.Sun Jul 28 22:34:00 2024 - [info] Got exit code 20 (Master dead).Sun Jul 28 22:34:01 2024 - [info] MHA::MasterFailover version 0.58.Sun Jul 28 22:34:01 2024 - [info] Starting master failover.Sun Jul 28 22:34:01 2024 - [info]Sun Jul 28 22:34:01 2024 - [info] * Phase 1: Configuration Check Phase..Sun Jul 28 22:34:01 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] GTID failover mode = 1Sun Jul 28 22:34:02 2024 - [info] Dead Servers:Sun Jul 28 22:34:02 2024 - [info] host-01(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Checking master reachability via MySQL(double check)...Sun Jul 28 22:34:02 2024 - [info] ok.Sun Jul 28 22:34:02 2024 - [info] Alive Servers:Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306)Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306)Sun Jul 28 22:34:02 2024 - [info] Alive Slaves:Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Starting GTID based failover.Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] ** Phase 1: Configuration Check Phase completed.Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] * Phase 2: Dead Master Shutdown Phase..Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] Forcing shutdown so that applications never connect to the current master..Sun Jul 28 22:34:02 2024 - [info] Executing master IP deactivation script:Sun Jul 28 22:34:02 2024 - [info] /home/mha/masterha/script/master_ip_failover --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306 --command=stopssh --ssh_user=mhaIN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===Disabling the VIP on old master: host-01RTNETLINK answers: Cannot assign requested addressSun Jul 28 22:34:02 2024 - [info] done.Sun Jul 28 22:34:02 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Sun Jul 28 22:34:02 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] * Phase 3: Master Recovery Phase..Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:335Sun Jul 28 22:34:02 2024 - [info] Retrieved Gtid Set: 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1Sun Jul 28 22:34:02 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:335Sun Jul 28 22:34:02 2024 - [info] Retrieved Gtid Set: 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1Sun Jul 28 22:34:02 2024 - [info] Oldest slaves:Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] * Phase 3.3: Determining New Master Phase..Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] Searching new master from slaves..Sun Jul 28 22:34:02 2024 - [info] Candidate masters from the configuration file:Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 22:34:02 2024 - [info] GTID ONSun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 22:34:02 2024 - [info] Non-candidate masters:Sun Jul 28 22:34:02 2024 - [info] Searching from candidate_master slaves which have received the latest relay log events..Sun Jul 28 22:34:02 2024 - [info] New master is host-02(10.28.12.22:3306)Sun Jul 28 22:34:02 2024 - [info] Starting master failover..Sun Jul 28 22:34:02 2024 - [info]From:host-01(10.28.12.21:3306) (current master)+--host-02(10.28.12.22:3306)+--host-03(10.28.12.23:3306)To:host-02(10.28.12.22:3306) (new master)+--host-03(10.28.12.23:3306)Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] * Phase 3.3: New Master Recovery Phase..Sun Jul 28 22:34:02 2024 - [info]Sun Jul 28 22:34:02 2024 - [info] Waiting all logs to be applied..Sun Jul 28 22:34:02 2024 - [info] done.Sun Jul 28 22:34:02 2024 - [info] Getting new master's binlog name and position..Sun Jul 28 22:34:02 2024 - [info] mysql-bin.000001:342Sun Jul 28 22:34:02 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host-02 or 10.28.12.22', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Sun Jul 28 22:34:02 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 342, 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1Sun Jul 28 22:34:02 2024 - [info] Executing master IP activate script:Sun Jul 28 22:34:02 2024 - [info] /home/mha/masterha/script/master_ip_failover --command=start --ssh_user=mha --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306 --new_master_host=host-02 --new_master_ip=10.28.12.22 --new_master_port=3306 --new_master_user='monitor' --new_master_password=xxxUnknown option: new_master_userUnknown option: new_master_passwordIN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===Enabling the VIP - 10.28.12.99 on the new master - host-02RTNETLINK answers: File existsSun Jul 28 22:34:03 2024 - [info] OK.Sun Jul 28 22:34:03 2024 - [info] ** Finished master recovery successfully.Sun Jul 28 22:34:03 2024 - [info] * Phase 3: Master Recovery Phase completed.Sun Jul 28 22:34:03 2024 - [info]Sun Jul 28 22:34:03 2024 - [info] * Phase 4: Slaves Recovery Phase..Sun Jul 28 22:34:03 2024 - [info]Sun Jul 28 22:34:03 2024 - [info]Sun Jul 28 22:34:03 2024 - [info] * Phase 4.1: Starting Slaves in parallel..Sun Jul 28 22:34:03 2024 - [info]Sun Jul 28 22:34:03 2024 - [info] -- Slave recovery on host host-03(10.28.12.23:3306) started, pid: 22068. Check tmp log /home/mha/masterha/work/host-03_3306_20240728223400.log if it takes time..Sun Jul 28 22:34:05 2024 - [info]Sun Jul 28 22:34:05 2024 - [info] Log messages from host-03 ...Sun Jul 28 22:34:05 2024 - [info]Sun Jul 28 22:34:03 2024 - [info] Resetting slave host-03(10.28.12.23:3306) and starting replication from the new master host-02(10.28.12.22:3306)..Sun Jul 28 22:34:03 2024 - [info] Executed CHANGE MASTER.Sun Jul 28 22:34:04 2024 - [info] Slave started.Sun Jul 28 22:34:04 2024 - [info] gtid_wait(4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1) completed on host-03(10.28.12.23:3306). Executed 0 events.Sun Jul 28 22:34:05 2024 - [info] End of log messages from host-03.Sun Jul 28 22:34:05 2024 - [info] -- Slave on host host-03(10.28.12.23:3306) started.Sun Jul 28 22:34:05 2024 - [info] All new slave servers recovered successfully.Sun Jul 28 22:34:05 2024 - [info]Sun Jul 28 22:34:05 2024 - [info] * Phase 5: New master cleanup phase..Sun Jul 28 22:34:05 2024 - [info]Sun Jul 28 22:34:05 2024 - [info] Resetting slave info on the new master..Sun Jul 28 22:34:05 2024 - [info] host-02: Resetting slave info succeeded.Sun Jul 28 22:34:05 2024 - [info] Master failover to host-02(10.28.12.22:3306) completed successfully.Sun Jul 28 22:34:05 2024 - [info] Deleted server1 entry from /home/mha/masterha/conf/mha.conf .Sun Jul 28 22:34:05 2024 - [info]----- Failover Report -----mha: MySQL Master failover host-01(10.28.12.21:3306) to host-02(10.28.12.22:3306) succeededMaster host-01(10.28.12.21:3306) is down!Check MHA Manager logs at host-03:/home/mha/masterha/log/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on host-01(10.28.12.21:3306)Selected host-02(10.28.12.22:3306) as a new master.host-02(10.28.12.22:3306): OK: Applying all logs succeeded.host-02(10.28.12.22:3306): OK: Activated master IP address.host-03(10.28.12.23:3306): OK: Slave started, replicating from host-02(10.28.12.22:3306)host-02(10.28.12.22:3306): Resetting slave info succeeded.Master failover to host-02(10.28.12.22:3306) completed successfully.Sun Jul 28 22:34:05 2024 - [info] Sending mail..Unknown option: conf3. 修复故障节点,重新加入集群中# 注意:当原Master恢复后,不会自动转成Slave,需要手动处理,并将原Master加入到复制集群。mysqld_safe --defaults-file=/mysql/mysql8022/conf/my.cnf --user=mysql8022 &mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sockCHANGE MASTER TO MASTER_HOST='10.28.12.22', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='repl';start slave;# 配置新主库只读mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock -e"set global read_only=on# 修复mha.conf配置文件# 由于在mha服务启动脚本中添加了--remove_dead_master_conf选项,切换后会自动将Master节点信息从/etc/masterha/mha.cnf中移除vi /home/mha/masterha/conf/mha.conf[server default]manager_log=/home/mha/masterha/log/manager.logmanager_workdir=/home/mha/masterha/workmaster_binlog_dir=/mysql/mysql8022/binlogmaster_ip_failover_script=/home/mha/masterha/script/master_ip_failovermaster_ip_online_change_script=/home/mha/masterha/script/master_ip_online_changepassword=monitorping_interval=1remote_workdir=/home/mha/masterha/workrepl_password=replrepl_user=replreport_script=/home/mha/masterha/script/send_reportsecondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-02 --master_ip=10.28.12.22 --master_port=3306ssh_user=mhauser=monitor[server1]hostname=host-02port=3306[server2]candidate_master=1check_repl_delay=0hostname=host-01port=3306[server3]hostname=host-03port=3306# 检查ssh状态/usr/bin/masterha_check_ssh --conf=/home/mha/masterha/conf/mha.conf# 检查主备复制/usr/bin/masterha_check_repl --conf=/home/mha/masterha/conf/mha.conf
1. 启动mha服务sh /home/mha/masterha/script/mha_start.sh2. 执行主备切换masterha_master_switch --conf=/home/mha/masterha/conf/mha.conf --master_state=alive --new_master_host=host-01 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000# 主备切换日志如下Sun Jul 28 23:25:08 2024 - [info] MHA::MasterRotate version 0.58.Sun Jul 28 23:25:08 2024 - [info] Starting online master switch..Sun Jul 28 23:25:08 2024 - [info]Sun Jul 28 23:25:08 2024 - [info] * Phase 1: Configuration Check Phase..Sun Jul 28 23:25:08 2024 - [info]Sun Jul 28 23:25:08 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Jul 28 23:25:08 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 23:25:08 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..Sun Jul 28 23:25:09 2024 - [info] GTID failover mode = 1Sun Jul 28 23:25:09 2024 - [info] Current Alive Master: host-02(10.28.12.22:3306)Sun Jul 28 23:25:09 2024 - [info] Alive Slaves:Sun Jul 28 23:25:09 2024 - [info] host-01(10.28.12.21:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 23:25:09 2024 - [info] GTID ONSun Jul 28 23:25:09 2024 - [info] Replicating from 10.28.12.22(10.28.12.22:3306)Sun Jul 28 23:25:09 2024 - [info] Primary candidate for the new Master (candidate_master is set)Sun Jul 28 23:25:09 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabledSun Jul 28 23:25:09 2024 - [info] GTID ONSun Jul 28 23:25:09 2024 - [info] Replicating from 10.28.12.22(10.28.12.22:3306)It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on host-02(10.28.12.22:3306)? (YES/no): YESSun Jul 28 23:25:23 2024 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Sun Jul 28 23:25:23 2024 - [info] ok.Sun Jul 28 23:25:23 2024 - [info] Checking MHA is not monitoring or doing failover..Sun Jul 28 23:25:23 2024 - [info] Checking replication health on host-01..Sun Jul 28 23:25:23 2024 - [info] ok.Sun Jul 28 23:25:23 2024 - [info] Checking replication health on host-03..Sun Jul 28 23:25:23 2024 - [info] ok.Sun Jul 28 23:25:23 2024 - [info] host-01 can be new master.Sun Jul 28 23:25:23 2024 - [info]From:host-02(10.28.12.22:3306) (current master)+--host-01(10.28.12.21:3306)+--host-03(10.28.12.23:3306)To:host-01(10.28.12.21:3306) (new master)+--host-03(10.28.12.23:3306)+--host-02(10.28.12.22:3306)Starting master switch from host-02(10.28.12.22:3306) to host-01(10.28.12.21:3306)? (yes/NO): yesSun Jul 28 23:25:39 2024 - [info] Checking whether host-01(10.28.12.21:3306) is ok for the new master..Sun Jul 28 23:25:39 2024 - [info] ok.Sun Jul 28 23:25:39 2024 - [info] host-02(10.28.12.22:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Sun Jul 28 23:25:39 2024 - [info] host-02(10.28.12.22:3306): Resetting slave pointing to the dummy host.Sun Jul 28 23:25:39 2024 - [info] ** Phase 1: Configuration Check Phase completed.Sun Jul 28 23:25:39 2024 - [info]Sun Jul 28 23:25:39 2024 - [info] * Phase 2: Rejecting updates Phase..Sun Jul 28 23:25:39 2024 - [info]Sun Jul 28 23:25:39 2024 - [info] Executing master ip online change script to disable write on the current master:Sun Jul 28 23:25:39 2024 - [info] /home/mha/masterha/script/master_ip_online_change --command=stop --orig_master_host=host-02 --orig_master_ip=10.28.12.22 --orig_master_port=3306 --orig_master_user='monitor' --new_master_host=host-01 --new_master_ip=10.28.12.21 --new_master_port=3306 --new_master_user='monitor' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxxSun Jul 28 23:25:39 2024 918726 Set read_only on the new master.. ok.Sun Jul 28 23:25:39 2024 922218 Waiting all running 3 threads are disconnected.. (max 1500 milliseconds){'Time' => '3448','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}{'Time' => '3096','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}{'Time' => '2780','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}Sun Jul 28 23:25:40 2024 428103 Waiting all running 3 threads are disconnected.. (max 1000 milliseconds){'Time' => '3449','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}{'Time' => '3097','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}{'Time' => '2781','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}Sun Jul 28 23:25:40 2024 937856 Waiting all running 3 threads are disconnected.. (max 500 milliseconds){'Time' => '3449','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}{'Time' => '3097','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}{'Time' => '2781','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}Sun Jul 28 23:25:41 2024 451314 Set read_only=1 on the orig master.. ok.Sun Jul 28 23:25:41 2024 451933 Waiting all running 3 queries are disconnected.. (max 500 milliseconds){'Time' => '3450','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}{'Time' => '3098','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}{'Time' => '2782','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}Disabling the VIP on old master: host-02Sun Jul 28 23:25:42 2024 129349 Killing all application threads..Sun Jul 28 23:25:42 2024 131797 done.Sun Jul 28 23:25:42 2024 - [info] ok.Sun Jul 28 23:25:42 2024 - [info] Locking all tables on the orig master to reject updates from everybody (including root):Sun Jul 28 23:25:42 2024 - [info] Executing FLUSH TABLES WITH READ LOCK..Sun Jul 28 23:25:42 2024 - [info] ok.Sun Jul 28 23:25:42 2024 - [info] Orig master binlog:pos is mysql-bin.000001:342.Sun Jul 28 23:25:42 2024 - [info] Waiting to execute all relay logs on host-01(10.28.12.21:3306)..Sun Jul 28 23:25:42 2024 - [info] master_pos_wait(mysql-bin.000001:342) completed on host-01(10.28.12.21:3306). Executed 0 events.Sun Jul 28 23:25:42 2024 - [info] done.Sun Jul 28 23:25:42 2024 - [info] Getting new master's binlog name and position..Sun Jul 28 23:25:42 2024 - [info] mysql-bin.000002:196Sun Jul 28 23:25:42 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host-01 or 10.28.12.21', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Sun Jul 28 23:25:42 2024 - [info] Executing master ip online change script to allow write on the new master:Sun Jul 28 23:25:42 2024 - [info] /home/mha/masterha/script/master_ip_online_change --command=start --orig_master_host=host-02 --orig_master_ip=10.28.12.22 --orig_master_port=3306 --orig_master_user='monitor' --new_master_host=host-01 --new_master_ip=10.28.12.21 --new_master_port=3306 --new_master_user='monitor' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxxSun Jul 28 23:25:42 2024 247802 Set read_only=0 on the new master.Enabling the VIP - 10.28.12.99 on the new master - host-01Sun Jul 28 23:25:42 2024 - [info] ok.Sun Jul 28 23:25:42 2024 - [info]Sun Jul 28 23:25:42 2024 - [info] * Switching slaves in parallel..Sun Jul 28 23:25:42 2024 - [info]Sun Jul 28 23:25:42 2024 - [info] -- Slave switch on host host-03(10.28.12.23:3306) started, pid: 24678Sun Jul 28 23:25:42 2024 - [info]Sun Jul 28 23:25:43 2024 - [info] Log messages from host-03 ...Sun Jul 28 23:25:43 2024 - [info]Sun Jul 28 23:25:42 2024 - [info] Waiting to execute all relay logs on host-03(10.28.12.23:3306)..Sun Jul 28 23:25:42 2024 - [info] master_pos_wait(mysql-bin.000001:342) completed on host-03(10.28.12.23:3306). Executed 0 events.Sun Jul 28 23:25:42 2024 - [info] done.Sun Jul 28 23:25:42 2024 - [info] Resetting slave host-03(10.28.12.23:3306) and starting replication from the new master host-01(10.28.12.21:3306)..Sun Jul 28 23:25:42 2024 - [info] Executed CHANGE MASTER.Sun Jul 28 23:25:42 2024 - [info] Slave started.Sun Jul 28 23:25:43 2024 - [info] End of log messages from host-03 ...Sun Jul 28 23:25:43 2024 - [info]Sun Jul 28 23:25:43 2024 - [info] -- Slave switch on host host-03(10.28.12.23:3306) succeeded.Sun Jul 28 23:25:43 2024 - [info] Unlocking all tables on the orig master:Sun Jul 28 23:25:43 2024 - [info] Executing UNLOCK TABLES..Sun Jul 28 23:25:43 2024 - [info] ok.Sun Jul 28 23:25:43 2024 - [info] Starting orig master as a new slave..Sun Jul 28 23:25:43 2024 - [info] Resetting slave host-02(10.28.12.22:3306) and starting replication from the new master host-01(10.28.12.21:3306)..Sun Jul 28 23:25:43 2024 - [info] Executed CHANGE MASTER.Sun Jul 28 23:25:43 2024 - [info] Slave started.Sun Jul 28 23:25:43 2024 - [info] All new slave servers switched successfully.Sun Jul 28 23:25:43 2024 - [info]Sun Jul 28 23:25:43 2024 - [info] * Phase 5: New master cleanup phase..Sun Jul 28 23:25:43 2024 - [info]Sun Jul 28 23:25:43 2024 - [info] host-01: Resetting slave info succeeded.Sun Jul 28 23:25:43 2024 - [info] Switching master to host-01(10.28.12.21:3306) completed successfully.3. 恢复mha.conf配置文件,恢复MHA监控服务vi /home/mha/masterha/conf/mha.conf[server default]manager_log=/home/mha/masterha/log/manager.logmanager_workdir=/home/mha/masterha/workmaster_binlog_dir=/mysql/mysql8022/binlogmaster_ip_failover_script=/home/mha/masterha/script/master_ip_failovermaster_ip_online_change_script=/home/mha/masterha/script/master_ip_online_changepassword=monitorping_interval=1remote_workdir=/home/mha/masterha/workrepl_password=replrepl_user=replreport_script=/home/mha/masterha/script/send_reportsecondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306ssh_user=mhauser=monitor[server1]hostname=host-01port=3306[server2]candidate_master=1check_repl_delay=0hostname=host-02port=3306[server3]hostname=host-03port=33064. 启动MHA服务,继续进行监控sh /home/mha/masterha/script/mha_start.sh
本文内容就到这啦,阅读完本篇,相信你也掌握了对MySQL的MHA高可用架构的搭建了吧!我们下篇再见!





