暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

保姆级教程教你搭建MySQL MHA高可用架构

呆呆的私房菜 2024-08-01
339
    Whoami:5年+金融、政府、医疗领域工作经验的DBA
    Certificate:OCP、PCP
    Skill:Oracle、Mysql、PostgreSQL
    Platform:CSDN、墨天伦、公众号(呆呆的私房菜)


    阅读本文可以了解到MySQL MHA高可用架构和详细的部署实施、故障切换和主备切换等内容。




    01

    MHA概述

    • MHA是目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。

    • 在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

    • MHA软件由MHA Manager和MHA Node两部分组成。Manager节点单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。Node节点部署在MySQL服务器上。

    • 整个故障转移过程对应用程序完全透明


    02

    MHA工作原理
    • 1. 从宕机崩溃的master保存binlog events;

    • 2. 识别含有最新更新的slave;

    • 3. 应用差异的relay log到其他的slave;

    • 4. 应用从master保存的binlog events;

    • 5. 提升一个slave为新的master;

    • 6. 将其它的slave连接到新的master进行复制。

    03

    MHA工具介绍

    MHA软件由MHA Manager和Node节点组成,Manager可以单独部署在一台服务器上管理多个master-slave集群,也可以部署在某个master-slave集群中的slave节点上。

    Manager工具包:

    https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz

    组件名称组件说明
    masterha_check_ssh
    检查MHA的ssh配置状况
    masterha_check_repl
    检查MySQL复制状况
    masterha_manager
    启动MHA
    masteha_check_status检查当前MHA运行状态
    masterha_master_monitor检测master是否宕机
    masterha_master_switch
    控制故障转移(自动或手动)
    masterha_conf_host
    添加或删除配置的server信息


    Node工具包:

    https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz

    组件名称组件说明
    save_binary_log
    保存和复制master的二进制日志
    apply_diff_relay_logs
    识别差异的中继日志事件并将其差异的事件应用于其他的slave
    filter_mysqlbinlog去除不必要的rollback事件
    purge_relay_logs清理中继日志(不阻塞SQL线程)


    04

    MHA部署实施
    • 1. 环境说明


    主机名IP地址
    操作系统角色
    host-01
    10.28.12.21CentOS 7.6
    MHA Node、MySQL Maste
    host-02
    10.28.12.22CentOS 7.6MHA Node、MySQL Slave
    host-0310.28.12.23CentOS 7.6

    MHA Manager、MHA Node、MySQL Slave


    10.28.12.99
    VIP(业务IP)


    • 2. 数据库安装

      ## 创建数据库安装用户
      useradd mysql8022
      mkdir -p mysql/mysql8022
      chown -R mysql8022.mysql8022 mysql/mysql8022
      chmod 750 -R mysql/mysql8022


      ## 安装包上传到/mysql/mysql8022下
      cd mysql/mysql8022/
      xz -d mysql-8.0.22-linux-glibc2.12-x86_64.tar.xz
      tar xf mysql-8.0.22-linux-glibc2.12-x86_64.tar
      mv mysql-8.0.22-linux-glibc2.12-x86_64 mysql-8.0.22
      vi home/mysql8022/.bashrc
      export PATH=$PATH:/mysql/mysql8022/mysql-8.0.22/bin
      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mysql/mysql8022/mysql-8.0.22/lib


      ## 创建数据库相关目录
      su - mysql8022
      mkdir -p mysql/mysql8022/{binlog,relaylog,log,redo,data,temp,conf}


      ## 编辑配置文件
      vi mysql/mysql8022/conf/my.cnf
      [client]
      port = 3306
      socket = mysql/mysql8022/temp/mysql.sock
      default-character-set = utf8mb4
      [mysqld]
      #base
      user = mysql8022
      port = 3306
      basedir = home/mysql8022/soft/mysql-8.0.22
      datadir = mysql/mysql8022/data
      tmpdir = mysql/mysql8022/temp
      socket = mysql/mysql8022/temp/mysql.sock
      pid-file = mysql/mysql8022/temp/mysql.pid
      #Open files limit
      open_files_limit = 65535
      #Connection
      max_connections = 3000
      max_user_connections = 1000
      max_connect_errors = 100
      wait_timeout = 600
      interactive_timeout = 600
      thread_cache_size = 2560
      back_log = 1024
      #Character
      character-set-server = utf8mb4
      #isolation
      transaction-isolation = READ-COMMITTED
      #Time
      default_time_zone = '+8:00'
      log_timestamps = SYSTEM
      #Binlog
      server-id = 21
      log-bin = mysql/mysql8022/binlog/mysql-bin
      binlog_cache_size = 128m
      max_binlog_cache_size = 6G
      max_binlog_size = 1G
      binlog_error_action = ABORT_SERVER
      #sql mode
      sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
      #gtid
      gtid_mode = on
      enforce_gtid_consistency = 1
      #InnoDB
      default-storage-engine = InnoDB
      innodb_data_home_dir = mysql/mysql8022/data
      innodb_data_file_path = ibdata1:500M:autoextend
      innodb_temp_data_file_path = ibtmp1:200M:autoextend
      innodb_file_per_table = 1
      innodb_log_group_home_dir = mysql/mysql8022/redo
      innodb_log_files_in_group = 3
      innodb_buffer_pool_instances = 4
      innodb_flush_log_at_trx_commit = 1
      innodb_max_dirty_pages_pct = 75
      innodb_open_files = 10000
      #Relay log
      relay-log-recovery = 1
      relay-log = mysql/mysql8022/relaylog/relay-bin
      relay-log-index = mysql/mysql8022/relaylog/relay-bin.index
      relay_log_purge = 0
      #Sync log
      sync_binlog = 1
      sync_relay_log = 1
      sync_relay_log_info = 1
      #Table
      lower_case_table_names = 1
      #Table cache
      table_open_cache = 2048
      table_definition_cache = 1024
      table_open_cache_instances = 32
      #Lock
      lock_wait_timeout = 3600
      explicit_defaults_for_timestamp = 1
      #Cache buffer
      key_buffer_size = 32M
      key_cache_block_size = 16384
      join_buffer_size = 8M
      sort_buffer_size = 8M
      read_buffer_size = 8M
      read_rnd_buffer_size = 4M
      bulk_insert_buffer_size = 64M
      myisam_sort_buffer_size = 128M
      myisam_max_sort_file_size = 2G
      tmp_table_size = 32M
      max_heap_table_size = 32M
      max_allowed_packet = 32M
      #Logs
      log_error = mysql/mysql8022/log/mysql-error.log
      slow_query_log = 1
      slow_query_log_file = mysql/mysql8022/log/slow_statement.log
      long_query_time = 3
      log_queries_not_using_indexes = 1
      log_timestamps = SYSTEM
      log_throttle_queries_not_using_indexes = 60
      general_log = 0
      general_log_file = mysql/mysql8022/log/general_statement.log
      [mysqld_safe]
      log-error = mysql/mysql8022/log/mysql-error.log
      pid-file = mysql/mysql8022/temp/mysql.pid
      [mysqldump]
      quick = on
      max_allowed_packet = 32M


      ## 注意:默认初始化的表空间是12M,innodb-data-file-path=ibdata1:12M:autoextend
      ## 默认的表名存储方式是0 --lower-case-table-names=0
      ## 初始化数据库
      mysqld --initialize-insecure --basedir=/home/mysql8022/soft/mysql-8.0.22 --datadir=/mysql/mysql8022/data --user=mysql8022 --lower-case-table-names=1 --innodb-data-file-path=ibdata1:500M:autoextend


      ## 启动数据库
      mysqld_safe --defaults-file=/mysql/mysql8022/conf/my.cnf --user=mysql8022 &


      ## 修改用户密码
      mysql -uroot -p -S/mysql/mysql8022/temp/mysql.sock
      alter user root@'localhost' identified by 'root';
      flush privileges;


      ## 登录数据库
      mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock
      • 3. 主从同步搭建

        # 创建同步账号(所有数据库)
        mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock
        create user 'repl'@'%' identified with 'mysql_native_password' by 'repl';
        grant replication slave, replication_slave_admin on *.* to 'repl'@'%';
        flush privileges;


        # 创建mha监控用户monitor(所有数据库)
        create user 'monitor'@'%' identified with 'mysql_native_password' by 'monitor';
        grant all privileges on *.* to 'monitor'@'%';
        flush privileges;


        # 重置binlog(所有数据库)
        reset master;


        # 从库执行
        mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock
        change master to
        master_host="10.28.12.21",
        master_user="repl",
        master_password="repl",
        master_auto_position=1;
        start slave;
        show slave status\G;


        # 配置从库只读
        set global read_only=on
        • 4. 安装MHA

          # 在线安装依赖
          curl -o etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repo
          curl -o etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
          yum clean all && yum repolist
          yum -y install perl-Config-Tiny perl-Email-Date-Format perl-Log-Dispatch perl-Mail-Sender perl-Mail-Sendmail perl-MIME-Lite perl-MIME-Types perl-Parallel-ForkManager perl-Params-Validate perl-DBI perl-DBD-MySQL


          # 离线安装依赖
          # 上传依赖包到/opt/soft下
          perl-Config-Tiny-2.14-7.el7.src.rpm
          perl-Email-Date-Format-1.002-15.el7.noarch.rpm
          perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
          perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
          perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
          perl-MIME-Lite-3.030-1.el7.noarch.rpm
          perl-MIME-Types-1.38-2.el7.noarch.rpm
          perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
          perl-Params-Validate-1.08-4.el7.x86_64.rpm
          # 安装依赖
          cd opt/soft/
          yum -y install *.rpm


          # 上传mha软件到/opt/mha下
          mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
          mha4mysql-node-0.58-0.el7.centos.noarch.rpm
          # 安装mha(安装manager,需要先装node,不然会缺少依赖)
          cd opt/mha
          rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
          rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
          • 5. 配置MHA

            # 配置/etc/hosts文件(所有节点)
            vi etc/hosts
            10.28.12.21 host-01
            10.28.12.22 host-02
            10.28.12.23 host-03


            # 安装用户(所有节点)
            su - root
            useradd mha
            echo "mha" | passwd --stdin mha
            # 这里是为了能够访问binlog文件
            usermod -a -G mysql8022 mha
            chmod -R 750 mysql/mysql8022


            # 配置环境变量
            vi home/mha/.bashrc
            export PATH=$PATH:/mysql/mysql8022/mysql-8.0.22/bin
            export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mysql/mysql8022/mysql-8.0.22/lib


            # 配置用户sudo权限(所有节点)
            visudo
            mha ALL=(ALL) NOPASSWD:ALL


            # 配置互信(所有节点)
            su - mha
            ssh-keygen -t rsa
            ssh-copy-id host-01
            ssh-copy-id host-02
            ssh-copy-id host-03
            # 验证互信
            ssh host-01 date
            ssh host-02 date
            ssh host-03 date


            # 创建相关目录(mha node节点)
            mkdir -p home/mha/masterha/{script,relaylogs_purge,work}


            # 配置Slave节点relaylog自动清理脚本(两个slave节点)
            vi home/mha/masterha/script/purge_relay_log.sh
            #!/bin/bash
            user=monitor
            passwd=monitor
            port=3306
            log_dir='/home/mha/masterha/relaylogs_purge'
            work_dir='/home/mha/masterha/work'
            if [ ! -d $log_dir ]
            then
            mkdir -p $log_dir
            fi
            /usr/bin/purge_relay_logs --user=$user --password=$passwd --port=$port --workdir=$work_dir --disable_relay_log_purge >> $log_dir/purge_relay_logs.log 2>&1
            # 配置定时任务
            chmod +x home/mha/masterha/script/purge_relay_log.sh
            crontab -e
            30 23 * * * sh home/mha/masterha/script/purge_relay_log.sh


            # 配置mha配置文件(mater配置)
            mkdir -p home/mha/masterha/conf
            mkdir -p home/mha/masterha/log
            vi home/mha/masterha/conf/mha.conf
            [server default]
            user=monitor
            password=monitor
            repl_user=repl
            repl_password=repl
            ssh_user=mha
            ping_interval=1
            master_binlog_dir=/mysql/mysql8022/binlog
            manager_workdir=/home/mha/masterha/work
            manager_log=/home/mha/masterha/log/manager.log
            master_ip_failover_script=/home/mha/masterha/script/master_ip_failover
            master_ip_online_change_script=/home/mha/masterha/script/master_ip_online_change
            secondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306
            report_script=/home/mha/masterha/script/send_report
            remote_workdir=/home/mha/masterha/work
            [server1]
            hostname=host-01
            port=3306
            [server2]
            hostname=host-02
            port=3306
            candidate_master=1
            check_repl_delay=0
            [server3]
            hostname=host-03
            port=3306


            # 配置vip(配置在mysql master节点上)
            /sbin/ifconfig eth0:1 10.28.12.99/24
            # 检查vip添加是否成功
            ip a | grep eth0


            # 配置master_ip_failover脚本
            vi home/mha/masterha/script/master_ip_failover
            #!/usr/bin/env perl
            use strict;
            use warnings FATAL => 'all';
            use Getopt::Long;
            my (
            $command, $ssh_user, $orig_master_host, $orig_master_ip,
            $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
            );
            #############################添加内容部分#########################################
            my $vip = '10.28.12.99';
            my $brdc = '10.28.12.255';
            my $ifdev = 'eth0';
            my $key = '1';
            my $ssh_start_vip = "sudo usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;";
            my $ssh_stop_vip = "sudo usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
            ##################################################################################
            GetOptions(
            'command=s' => \$command,
            'ssh_user=s' => \$ssh_user,
            'orig_master_host=s' => \$orig_master_host,
            'orig_master_ip=s' => \$orig_master_ip,
            'orig_master_port=i' => \$orig_master_port,
            'new_master_host=s' => \$new_master_host,
            'new_master_ip=s' => \$new_master_ip,
            'new_master_port=i' => \$new_master_port,
            );
            exit &main();
            sub main {
            print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
            if ( $command eq "stop" || $command eq "stopssh" ) {
            my $exit_code = 1;
            eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
            };
            if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
            }
            exit $exit_code;
            }
            elsif ( $command eq "start" ) {
            my $exit_code = 10;
            eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
            };
            if ($@) {
            warn $@;
            exit $exit_code;
            }
            exit $exit_code;
            }
            elsif ( $command eq "status" ) {
            print "Checking the Status of the script.. OK \n";
            exit 0;
            }
            else {
            &usage();
            exit 1;
            }
            }
            sub start_vip() {
            `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
            }
            # A simple system call that disable the VIP on the old_master
            sub stop_vip() {
            `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
            }
            sub usage {
            print
            "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
            }


            # 配置master_ip_online_change脚本
            vi /home/mha/masterha/script/master_ip_online_change
            #!/usr/bin/env perl
            use strict;
            use warnings FATAL => 'all';
            use Getopt::Long;
            use MHA::DBHelper;
            use MHA::NodeUtil;
            use Time::HiRes qw( sleep gettimeofday tv_interval );
            use Data::Dumper;
            my $_tstart;
            my $_running_interval = 0.1;
            my (
            $command,
            $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user,
            $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user,
            );
            my $vip = '10.28.12.99';
            my $brdc = '10.28.12.255';
            my $ifdev = 'eth0';
            my $key = '1';
            my $ssh_start_vip = "sudo /usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;";
            my $ssh_stop_vip = "sudo /usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
            GetOptions(
            'command=s' => \$command,
            'orig_master_is_new_slave' => \$orig_master_is_new_slave,
            'orig_master_host=s' => \$orig_master_host,
            'orig_master_ip=s' => \$orig_master_ip,
            'orig_master_port=i' => \$orig_master_port,
            'orig_master_user=s' => \$orig_master_user,
            'orig_master_password=s' => \$orig_master_password,
            'orig_master_ssh_user=s' => \$orig_master_ssh_user,
            'new_master_host=s' => \$new_master_host,
            'new_master_ip=s' => \$new_master_ip,
            'new_master_port=i' => \$new_master_port,
            'new_master_user=s' => \$new_master_user,
            'new_master_password=s' => \$new_master_password,
            'new_master_ssh_user=s' => \$new_master_ssh_user,
            );
            exit &main();
            sub current_time_us {
            my ( $sec, $microsec ) = gettimeofday();
            my $curdate = localtime($sec);
            return $curdate . " " . sprintf( "%06d", $microsec );
            }
            sub sleep_until {
            my $elapsed = tv_interval($_tstart);
            if ( $_running_interval > $elapsed ) {
            sleep( $_running_interval - $elapsed );
            }
            }
            sub get_threads_util {
            my $dbh = shift;
            my $my_connection_id = shift;
            my $running_time_threshold = shift;
            my $type = shift;
            $running_time_threshold = 0 unless ($running_time_threshold);
            $type = 0 unless ($type);
            my @threads;
            my $sth = $dbh->prepare("SHOW PROCESSLIST");
            $sth->execute();
            while ( my $ref = $sth->fetchrow_hashref() ) {
            my $id = $ref->{Id};
            my $user = $ref->{User};
            my $host = $ref->{Host};
            my $command = $ref->{Command};
            my $state = $ref->{State};
            my $query_time = $ref->{Time};
            my $info = $ref->{Info};
            $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
            next if ( $my_connection_id == $id );
            next if ( defined($query_time) && $query_time < $running_time_threshold );
            next if ( defined($command) && $command eq "Binlog Dump" );
            next if ( defined($user) && $user eq "system user" );
            next
            if ( defined($command)
            && $command eq "Sleep"
            && defined($query_time)
            && $query_time >= 1 );
            if ( $type >= 1 ) {
            next if ( defined($command) && $command eq "Sleep" );
            next if ( defined($command) && $command eq "Connect" );
            }
            if ( $type >= 2 ) {
            next if ( defined($info) && $info =~ m/^select/i );
            next if ( defined($info) && $info =~ m/^show/i );
            }
            push @threads, $ref;
            }
            return @threads;
            }
            sub main {
            if ( $command eq "stop" ) {
            ## Gracefully killing connections on the current master
            # 1. Set read_only= 1 on the new master
            # 2. DROP USER so that no app user can establish new connections
            # 3. Set read_only= 1 on the current master
            # 4. Kill current queries
            # * Any database access failure will result in script die.
            my $exit_code = 1;
            eval {
            ## Setting read_only=1 on the new master (to avoid accident)
            my $new_master_handler = new MHA::DBHelper();


            # args: hostname, port, user, password, raise_error(die_on_error)_or_not
            $new_master_handler->connect( $new_master_ip, $new_master_port,
            $new_master_user, $new_master_password, 1 );
            print current_time_us() . " Set read_only on the new master.. ";
            $new_master_handler->enable_read_only();
            if ( $new_master_handler->is_read_only() ) {
            print "ok.\n";
            }
            else {
            die "Failed!\n";
            }
            $new_master_handler->disconnect();
            # Connecting to the orig master, die if any database error happens
            my $orig_master_handler = new MHA::DBHelper();
            $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
            $orig_master_user, $orig_master_password, 1 );
            ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
            #$orig_master_handler->disable_log_bin_local();
            #print current_time_us() . " Drpping app user on the orig master..\n";
            #FIXME_xxx_drop_app_user($orig_master_handler);
            ## Waiting for N * 100 milliseconds so that current connections can exit
            my $time_until_read_only = 15;
            $_tstart = [gettimeofday];
            my @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
            while ( $time_until_read_only > 0 && $#threads >= 0 ) {
            if ( $time_until_read_only % 5 == 0 ) {
            printf
            "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
            if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
            foreach (@threads);
            }
            }
            sleep_until();
            $_tstart = [gettimeofday];
            $time_until_read_only--;
            @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
            }
            ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
            print current_time_us() . " Set read_only=1 on the orig master.. ";
            $orig_master_handler->enable_read_only();
            if ( $orig_master_handler->is_read_only() ) {
            print "ok.\n";
            }
            else {
            die "Failed!\n";
            }
            ## Waiting for M * 100 milliseconds so that current update queries can complete
            my $time_until_kill_threads = 5;
            @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
            while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
            if ( $time_until_kill_threads % 5 == 0 ) {
            printf
            "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
            if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
            foreach (@threads);
            }
            }
            sleep_until();
            $_tstart = [gettimeofday];
            $time_until_kill_threads--;
            @threads = get_threads_util( $orig_master_handler->{dbh},
            $orig_master_handler->{connection_id} );
            }
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            ## Terminating all threads
            print current_time_us() . " Killing all application threads..\n";
            $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
            print current_time_us() . " done.\n";
            #$orig_master_handler->enable_log_bin_local();
            $orig_master_handler->disconnect();


            ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
            $exit_code = 0;
            };
            if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
            }
            exit $exit_code;
            }
            elsif ( $command eq "start" ) {
            ## Activating master ip on the new master
            # 1. Create app user with write privileges
            # 2. Moving backup script if needed
            # 3. Register new master's ip to the catalog database
            # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
            # If exit code is 0 or 10, MHA does not abort
            my $exit_code = 10;
            eval {
            my $new_master_handler = new MHA::DBHelper();
            # args: hostname, port, user, password, raise_error_or_not
            $new_master_handler->connect( $new_master_ip, $new_master_port,
            $new_master_user, $new_master_password, 1 );
            ## Set read_only=0 on the new master
            #$new_master_handler->disable_log_bin_local();
            print current_time_us() . " Set read_only=0 on the new master.\n";
            $new_master_handler->disable_read_only();
            ## Creating an app user on the new master
            #print current_time_us() . " Creating app user on the new master..\n";
            #FIXME_xxx_create_app_user($new_master_handler);
            #$new_master_handler->enable_log_bin_local();
            $new_master_handler->disconnect();
            ## Update master ip on the catalog database, etc
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
            };
            if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
            }
            exit $exit_code;
            }
            elsif ( $command eq "status" ) {
            # do nothing
            exit 0;
            }
            else {
            &usage();
            exit 1;
            }
            }
            # A simple system call that enable the VIP on the new master
            sub start_vip() {
            `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
            }
            # A simple system call that disable the VIP on the old_master
            sub stop_vip() {
            `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
            }
            sub usage {
            print
            "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";
            die;
            }


            # 编辑发送报告模板(发送信息到系统用户mail下)
            vi /home/mha/masterha/script/send_report
            #!/usr/bin/perl
            use strict;
            use warnings FATAL => 'all';
            use Getopt::Long;
            #new_master_host and new_slave_hosts are set only when recovering master succeeded
            my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
            GetOptions(
            'orig_master_host=s' => \$dead_master_host,
            'new_master_host=s' => \$new_master_host,
            'new_slave_hosts=s' => \$new_slave_hosts,
            'subject=s' => \$subject,
            'body=s' => \$body,
            );
            my $log_file = '/var/spool/mail/mha';
            my $current_date = `date '+%Y-%m-%d %H:%M:%S'`;
            chomp $current_date;
            my $log_message = "$current_date - $subject: $body";
            open my $fh, '>>', $log_file or die "Cannot open $log_file: $!";
            print $fh $log_message . "\n";
            close $fh;
            exit 0;


            # 授权脚本权限
            chmod 775 /home/mha/masterha/script/master_ip_failover /home/mha/masterha/script/master_ip_online_change /home/mha/masterha/script/send_report


            # 检查ssh权限(主节点)
            masterha_check_ssh --conf=/home/mha/masterha/conf/mha.conf


            # 检查复制集群状态(主节点)
            masterha_check_repl --conf=/home/mha/masterha/conf/mha.conf


            # 配置mha启动脚本
            vi /home/mha/masterha/script/mha_start.sh
            nohup masterha_manager --conf=/home/mha/masterha/conf/mha.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /home/mha/masterha/log/manager.log 2>&1 &
            chmod +x /home/mha/masterha/script/mha_start.sh


            # 配置查看MHA manager状态脚本
            vi /home/mha/masterha/script/mha_status.sh
            masterha_check_status --conf=/home/mha/masterha/conf/mha.conf
            chmod +x /home/mha/masterha/script/mha_status.sh


            # 配置mha关闭脚本
            vi /home/mha/masterha/script/mha_stop.sh
            masterha_stop --conf=/home/mha/masterha/conf/mha.conf
            chmod +x /home/mha/masterha/script/mha_stop.sh


            05

            MHA故障切换
              1. 启动mha服务
              sh /home/mha/masterha/script/mha_start.sh


              # 启动日志如下
              Sun Jul 28 22:21:08 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
              Sun Jul 28 22:21:08 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..
              Sun Jul 28 22:21:08 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..
              Sun Jul 28 22:21:08 2024 - [info] MHA::MasterMonitor version 0.58.
              Sun Jul 28 22:21:09 2024 - [info] GTID failover mode = 0
              Sun Jul 28 22:21:09 2024 - [info] Dead Servers:
              Sun Jul 28 22:21:09 2024 - [info] Alive Servers:
              Sun Jul 28 22:21:09 2024 - [info] host-01(10.28.12.21:3306)
              Sun Jul 28 22:21:09 2024 - [info] host-02(10.28.12.22:3306)
              Sun Jul 28 22:21:09 2024 - [info] host-03(10.28.12.23:3306)
              Sun Jul 28 22:21:09 2024 - [info] Alive Slaves:
              Sun Jul 28 22:21:09 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:21:09 2024 - [info] GTID ON
              Sun Jul 28 22:21:09 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:21:09 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:21:09 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:21:09 2024 - [info] GTID ON
              Sun Jul 28 22:21:09 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:21:09 2024 - [info] Current Alive Master: host-01(10.28.12.21:3306)
              Sun Jul 28 22:21:09 2024 - [info] Checking slave configurations..
              Sun Jul 28 22:21:09 2024 - [info] read_only=1 is not set on slave host-02(10.28.12.22:3306).
              Sun Jul 28 22:21:09 2024 - [info] read_only=1 is not set on slave host-03(10.28.12.23:3306).
              Sun Jul 28 22:21:09 2024 - [info] Checking replication filtering settings..
              Sun Jul 28 22:21:09 2024 - [info] binlog_do_db= , binlog_ignore_db=
              Sun Jul 28 22:21:09 2024 - [info] Replication filtering check ok.
              Sun Jul 28 22:21:09 2024 - [info] GTID (with auto-pos) is not supported
              Sun Jul 28 22:21:09 2024 - [info] Starting SSH connection tests..
              Sun Jul 28 22:21:12 2024 - [info] All SSH connection tests passed successfully.
              Sun Jul 28 22:21:12 2024 - [info] Checking MHA Node version..
              Sun Jul 28 22:21:12 2024 - [info] Version check ok.
              Sun Jul 28 22:21:12 2024 - [info] Checking SSH publickey authentication settings on the current master..
              Sun Jul 28 22:21:13 2024 - [info] HealthCheck: SSH to host-01 is reachable.
              Sun Jul 28 22:21:13 2024 - [info] Checking recovery script configurations on host-01(10.28.12.21:3306)..
              Sun Jul 28 22:21:13 2024 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/mysql8022/binlog --output_file=/home/mha/masterha/work/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000001
              Sun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.21(host-01:22)..
              Creating /home/mha/masterha/work if not exists.. ok.
              Checking output directory is accessible or not..
              ok.
              Binlog found at /mysql/mysql8022/binlog, up to mysql-bin.000001
              Sun Jul 28 22:21:13 2024 - [info] Binlog setting check done.
              Sun Jul 28 22:21:13 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
              Sun Jul 28 22:21:13 2024 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=host-02 --slave_ip=10.28.12.22 --slave_port=3306 --workdir=/home/mha/masterha/work --target_version=8.0.22 --manager_version=0.58 --relay_dir=/mysql/mysql8022/relaylog --current_relay_log=relay-bin.000002 --slave_pass=xxx
              Sun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.22(host-02:22)..
              Checking slave recovery environment settings..
              Relay log found at /mysql/mysql8022/relaylog, up to relay-bin.000002
              Temporary relay log file is /mysql/mysql8022/relaylog/relay-bin.000002
              Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
              Testing mysql connection and privileges..
              mysql: [Warning] Using a password on the command line interface can be insecure.
              done.
              Testing mysqlbinlog output.. done.
              Cleaning up test file(s).. done.
              Sun Jul 28 22:21:13 2024 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=host-03 --slave_ip=10.28.12.23 --slave_port=3306 --workdir=/home/mha/masterha/work --target_version=8.0.22 --manager_version=0.58 --relay_dir=/mysql/mysql8022/relaylog --current_relay_log=relay-bin.000002 --slave_pass=xxx
              Sun Jul 28 22:21:13 2024 - [info] Connecting to mha@10.28.12.23(host-03:22)..
              Checking slave recovery environment settings..
              Relay log found at /mysql/mysql8022/relaylog, up to relay-bin.000002
              Temporary relay log file is /mysql/mysql8022/relaylog/relay-bin.000002
              Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
              Testing mysql connection and privileges..
              mysql: [Warning] Using a password on the command line interface can be insecure.
              done.
              Testing mysqlbinlog output.. done.
              Cleaning up test file(s).. done.
              Sun Jul 28 22:21:14 2024 - [info] Slaves settings check done.
              Sun Jul 28 22:21:14 2024 - [info]
              host-01(10.28.12.21:3306) (current master)
              +--host-02(10.28.12.22:3306)
              +--host-03(10.28.12.23:3306)


              Sun Jul 28 22:21:14 2024 - [info] Checking master_ip_failover_script status:
              Sun Jul 28 22:21:14 2024 - [info] /home/mha/masterha/script/master_ip_failover --command=status --ssh_user=mha --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306




              IN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===


              Checking the Status of the script.. OK
              Sun Jul 28 22:21:14 2024 - [info] OK.
              Sun Jul 28 22:21:14 2024 - [warning] shutdown_script is not defined.
              Sun Jul 28 22:21:14 2024 - [info] Set master ping interval 1 seconds.
              Sun Jul 28 22:21:14 2024 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306
              Sun Jul 28 22:21:14 2024 - [info] Starting ping health check on host-01(10.28.12.21:3306)..
              Sun Jul 28 22:21:14 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..




              2. 模拟master节点故障
              mysqladmin -uroot -proot -S/mysql/mysql8022/temp/mysql.sock shutdown


              # 观察到MHA已经实现主备自动切换
              # 业务IP从原master节点切换到新的master节点上
              # 原slave节点同步指向了新的master节点上
              Sun Jul 28 22:33:55 2024 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
              Sun Jul 28 22:33:55 2024 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306 --master_user=monitor --master_password=monitor --ping_type=SELECT
              Sun Jul 28 22:33:55 2024 - [info] Executing SSH check script: exit 0
              Sun Jul 28 22:33:55 2024 - [info] HealthCheck: SSH to host-01 is reachable.
              Monitoring server host-02 is reachable, Master is not reachable from host-02. OK.
              Monitoring server host-03 is reachable, Master is not reachable from host-03. OK.
              Sun Jul 28 22:33:56 2024 - [info] Master is not reachable from all other monitoring servers. Failover should start.
              Sun Jul 28 22:33:56 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))
              Sun Jul 28 22:33:56 2024 - [warning] Connection failed 2 time(s)..
              Sun Jul 28 22:33:57 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))
              Sun Jul 28 22:33:57 2024 - [warning] Connection failed 3 time(s)..
              Sun Jul 28 22:33:58 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.28.12.21' (111))
              Sun Jul 28 22:33:58 2024 - [warning] Connection failed 4 time(s)..
              Sun Jul 28 22:33:58 2024 - [warning] Master is not reachable from health checker!
              Sun Jul 28 22:33:58 2024 - [warning] Master host-01(10.28.12.21:3306) is not reachable!
              Sun Jul 28 22:33:58 2024 - [warning] SSH is reachable.
              Sun Jul 28 22:33:58 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /home/mha/masterha/conf/mha.conf again, and trying to connect to all servers to check server status..
              Sun Jul 28 22:33:58 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
              Sun Jul 28 22:33:58 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..
              Sun Jul 28 22:33:58 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..
              Sun Jul 28 22:34:00 2024 - [info] GTID failover mode = 1
              Sun Jul 28 22:34:00 2024 - [info] Dead Servers:
              Sun Jul 28 22:34:00 2024 - [info] host-01(10.28.12.21:3306)
              Sun Jul 28 22:34:00 2024 - [info] Alive Servers:
              Sun Jul 28 22:34:00 2024 - [info] host-02(10.28.12.22:3306)
              Sun Jul 28 22:34:00 2024 - [info] host-03(10.28.12.23:3306)
              Sun Jul 28 22:34:00 2024 - [info] Alive Slaves:
              Sun Jul 28 22:34:00 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:00 2024 - [info] GTID ON
              Sun Jul 28 22:34:00 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:00 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:34:00 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:00 2024 - [info] GTID ON
              Sun Jul 28 22:34:00 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:00 2024 - [info] Checking slave configurations..
              Sun Jul 28 22:34:00 2024 - [info] read_only=1 is not set on slave host-02(10.28.12.22:3306).
              Sun Jul 28 22:34:00 2024 - [info] read_only=1 is not set on slave host-03(10.28.12.23:3306).
              Sun Jul 28 22:34:00 2024 - [info] Checking replication filtering settings..
              Sun Jul 28 22:34:00 2024 - [info] Replication filtering check ok.
              Sun Jul 28 22:34:00 2024 - [info] Master is down!
              Sun Jul 28 22:34:00 2024 - [info] Terminating monitoring script.
              Sun Jul 28 22:34:00 2024 - [info] Got exit code 20 (Master dead).
              Sun Jul 28 22:34:01 2024 - [info] MHA::MasterFailover version 0.58.
              Sun Jul 28 22:34:01 2024 - [info] Starting master failover.
              Sun Jul 28 22:34:01 2024 - [info]
              Sun Jul 28 22:34:01 2024 - [info] * Phase 1: Configuration Check Phase..
              Sun Jul 28 22:34:01 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] GTID failover mode = 1
              Sun Jul 28 22:34:02 2024 - [info] Dead Servers:
              Sun Jul 28 22:34:02 2024 - [info] host-01(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Checking master reachability via MySQL(double check)...
              Sun Jul 28 22:34:02 2024 - [info] ok.
              Sun Jul 28 22:34:02 2024 - [info] Alive Servers:
              Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306)
              Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306)
              Sun Jul 28 22:34:02 2024 - [info] Alive Slaves:
              Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Starting GTID based failover.
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] ** Phase 1: Configuration Check Phase completed.
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] Forcing shutdown so that applications never connect to the current master..
              Sun Jul 28 22:34:02 2024 - [info] Executing master IP deactivation script:
              Sun Jul 28 22:34:02 2024 - [info] /home/mha/masterha/script/master_ip_failover --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306 --command=stopssh --ssh_user=mha




              IN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===


              Disabling the VIP on old master: host-01
              RTNETLINK answers: Cannot assign requested address
              Sun Jul 28 22:34:02 2024 - [info] done.
              Sun Jul 28 22:34:02 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
              Sun Jul 28 22:34:02 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] * Phase 3: Master Recovery Phase..
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:335
              Sun Jul 28 22:34:02 2024 - [info] Retrieved Gtid Set: 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1
              Sun Jul 28 22:34:02 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
              Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:335
              Sun Jul 28 22:34:02 2024 - [info] Retrieved Gtid Set: 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1
              Sun Jul 28 22:34:02 2024 - [info] Oldest slaves:
              Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:34:02 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] * Phase 3.3: Determining New Master Phase..
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] Searching new master from slaves..
              Sun Jul 28 22:34:02 2024 - [info] Candidate masters from the configuration file:
              Sun Jul 28 22:34:02 2024 - [info] host-02(10.28.12.22:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
              Sun Jul 28 22:34:02 2024 - [info] GTID ON
              Sun Jul 28 22:34:02 2024 - [info] Replicating from 10.28.12.21(10.28.12.21:3306)
              Sun Jul 28 22:34:02 2024 - [info] Primary candidate for the new Master (candidate_master is set)
              Sun Jul 28 22:34:02 2024 - [info] Non-candidate masters:
              Sun Jul 28 22:34:02 2024 - [info] Searching from candidate_master slaves which have received the latest relay log events..
              Sun Jul 28 22:34:02 2024 - [info] New master is host-02(10.28.12.22:3306)
              Sun Jul 28 22:34:02 2024 - [info] Starting master failover..
              Sun Jul 28 22:34:02 2024 - [info]
              From:
              host-01(10.28.12.21:3306) (current master)
              +--host-02(10.28.12.22:3306)
              +--host-03(10.28.12.23:3306)


              To:
              host-02(10.28.12.22:3306) (new master)
              +--host-03(10.28.12.23:3306)
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] * Phase 3.3: New Master Recovery Phase..
              Sun Jul 28 22:34:02 2024 - [info]
              Sun Jul 28 22:34:02 2024 - [info] Waiting all logs to be applied..
              Sun Jul 28 22:34:02 2024 - [info] done.
              Sun Jul 28 22:34:02 2024 - [info] Getting new master's binlog name and position..
              Sun Jul 28 22:34:02 2024 - [info] mysql-bin.000001:342
              Sun Jul 28 22:34:02 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host-02 or 10.28.12.22', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
              Sun Jul 28 22:34:02 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 342, 4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1
              Sun Jul 28 22:34:02 2024 - [info] Executing master IP activate script:
              Sun Jul 28 22:34:02 2024 - [info] /home/mha/masterha/script/master_ip_failover --command=start --ssh_user=mha --orig_master_host=host-01 --orig_master_ip=10.28.12.21 --orig_master_port=3306 --new_master_host=host-02 --new_master_ip=10.28.12.22 --new_master_port=3306 --new_master_user='monitor' --new_master_password=xxx
              Unknown option: new_master_user
              Unknown option: new_master_password




              IN SCRIPT TEST====sudo /usr/sbin/ip addr del 10.28.12.99/24 dev eth0 label eth0:1==sudo /usr/sbin/ip addr add 10.28.12.99/24 brd 10.28.12.255 dev eth0 label eth0:1;/usr/sbin/arping -q -A -c 1 -I eth0 10.28.12.99;===


              Enabling the VIP - 10.28.12.99 on the new master - host-02
              RTNETLINK answers: File exists
              Sun Jul 28 22:34:03 2024 - [info] OK.
              Sun Jul 28 22:34:03 2024 - [info] ** Finished master recovery successfully.
              Sun Jul 28 22:34:03 2024 - [info] * Phase 3: Master Recovery Phase completed.
              Sun Jul 28 22:34:03 2024 - [info]
              Sun Jul 28 22:34:03 2024 - [info] * Phase 4: Slaves Recovery Phase..
              Sun Jul 28 22:34:03 2024 - [info]
              Sun Jul 28 22:34:03 2024 - [info]
              Sun Jul 28 22:34:03 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
              Sun Jul 28 22:34:03 2024 - [info]
              Sun Jul 28 22:34:03 2024 - [info] -- Slave recovery on host host-03(10.28.12.23:3306) started, pid: 22068. Check tmp log /home/mha/masterha/work/host-03_3306_20240728223400.log if it takes time..
              Sun Jul 28 22:34:05 2024 - [info]
              Sun Jul 28 22:34:05 2024 - [info] Log messages from host-03 ...
              Sun Jul 28 22:34:05 2024 - [info]
              Sun Jul 28 22:34:03 2024 - [info] Resetting slave host-03(10.28.12.23:3306) and starting replication from the new master host-02(10.28.12.22:3306)..
              Sun Jul 28 22:34:03 2024 - [info] Executed CHANGE MASTER.
              Sun Jul 28 22:34:04 2024 - [info] Slave started.
              Sun Jul 28 22:34:04 2024 - [info] gtid_wait(4c96b5ab-4cc1-11ef-9d49-02420a1c0c15:1) completed on host-03(10.28.12.23:3306). Executed 0 events.
              Sun Jul 28 22:34:05 2024 - [info] End of log messages from host-03.
              Sun Jul 28 22:34:05 2024 - [info] -- Slave on host host-03(10.28.12.23:3306) started.
              Sun Jul 28 22:34:05 2024 - [info] All new slave servers recovered successfully.
              Sun Jul 28 22:34:05 2024 - [info]
              Sun Jul 28 22:34:05 2024 - [info] * Phase 5: New master cleanup phase..
              Sun Jul 28 22:34:05 2024 - [info]
              Sun Jul 28 22:34:05 2024 - [info] Resetting slave info on the new master..
              Sun Jul 28 22:34:05 2024 - [info] host-02: Resetting slave info succeeded.
              Sun Jul 28 22:34:05 2024 - [info] Master failover to host-02(10.28.12.22:3306) completed successfully.
              Sun Jul 28 22:34:05 2024 - [info] Deleted server1 entry from /home/mha/masterha/conf/mha.conf .
              Sun Jul 28 22:34:05 2024 - [info]


              ----- Failover Report -----


              mha: MySQL Master failover host-01(10.28.12.21:3306) to host-02(10.28.12.22:3306) succeeded


              Master host-01(10.28.12.21:3306) is down!


              Check MHA Manager logs at host-03:/home/mha/masterha/log/manager.log for details.


              Started automated(non-interactive) failover.
              Invalidated master IP address on host-01(10.28.12.21:3306)
              Selected host-02(10.28.12.22:3306) as a new master.
              host-02(10.28.12.22:3306): OK: Applying all logs succeeded.
              host-02(10.28.12.22:3306): OK: Activated master IP address.
              host-03(10.28.12.23:3306): OK: Slave started, replicating from host-02(10.28.12.22:3306)
              host-02(10.28.12.22:3306): Resetting slave info succeeded.
              Master failover to host-02(10.28.12.22:3306) completed successfully.
              Sun Jul 28 22:34:05 2024 - [info] Sending mail..
              Unknown option: conf


              3. 修复故障节点,重新加入集群中
              # 注意:当原Master恢复后,不会自动转成Slave,需要手动处理,并将原Master加入到复制集群。
              mysqld_safe --defaults-file=/mysql/mysql8022/conf/my.cnf --user=mysql8022 &
              mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock
              CHANGE MASTER TO MASTER_HOST='10.28.12.22', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='repl';
              start slave;
              # 配置新主库只读
              mysql -uroot -proot -S/mysql/mysql8022/temp/mysql.sock -e"set global read_only=on
              # 修复mha.conf配置文件
              # 由于在mha服务启动脚本中添加了--remove_dead_master_conf选项,切换后会自动将Master节点信息从/etc/masterha/mha.cnf中移除
              vi /home/mha/masterha/conf/mha.conf
              [server default]
              manager_log=/home/mha/masterha/log/manager.log
              manager_workdir=/home/mha/masterha/work
              master_binlog_dir=/mysql/mysql8022/binlog
              master_ip_failover_script=/home/mha/masterha/script/master_ip_failover
              master_ip_online_change_script=/home/mha/masterha/script/master_ip_online_change
              password=monitor
              ping_interval=1
              remote_workdir=/home/mha/masterha/work
              repl_password=repl
              repl_user=repl
              report_script=/home/mha/masterha/script/send_report
              secondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-02 --master_ip=10.28.12.22 --master_port=3306
              ssh_user=mha
              user=monitor
              [server1]
              hostname=host-02
              port=3306
              [server2]
              candidate_master=1
              check_repl_delay=0
              hostname=host-01
              port=3306
              [server3]
              hostname=host-03
              port=3306


              # 检查ssh状态
              /usr/bin/masterha_check_ssh --conf=/home/mha/masterha/conf/mha.conf
              # 检查主备复制
              /usr/bin/masterha_check_repl --conf=/home/mha/masterha/conf/mha.conf


              06

              MHA主备切换
                1. 启动mha服务
                sh /home/mha/masterha/script/mha_start.sh


                2. 执行主备切换
                masterha_master_switch --conf=/home/mha/masterha/conf/mha.conf --master_state=alive --new_master_host=host-01 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
                # 主备切换日志如下
                Sun Jul 28 23:25:08 2024 - [info] MHA::MasterRotate version 0.58.
                Sun Jul 28 23:25:08 2024 - [info] Starting online master switch..
                Sun Jul 28 23:25:08 2024 - [info]
                Sun Jul 28 23:25:08 2024 - [info] * Phase 1: Configuration Check Phase..
                Sun Jul 28 23:25:08 2024 - [info]
                Sun Jul 28 23:25:08 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
                Sun Jul 28 23:25:08 2024 - [info] Reading application default configuration from /home/mha/masterha/conf/mha.conf..
                Sun Jul 28 23:25:08 2024 - [info] Reading server configuration from /home/mha/masterha/conf/mha.conf..
                Sun Jul 28 23:25:09 2024 - [info] GTID failover mode = 1
                Sun Jul 28 23:25:09 2024 - [info] Current Alive Master: host-02(10.28.12.22:3306)
                Sun Jul 28 23:25:09 2024 - [info] Alive Slaves:
                Sun Jul 28 23:25:09 2024 - [info] host-01(10.28.12.21:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
                Sun Jul 28 23:25:09 2024 - [info] GTID ON
                Sun Jul 28 23:25:09 2024 - [info] Replicating from 10.28.12.22(10.28.12.22:3306)
                Sun Jul 28 23:25:09 2024 - [info] Primary candidate for the new Master (candidate_master is set)
                Sun Jul 28 23:25:09 2024 - [info] host-03(10.28.12.23:3306) Version=8.0.22 (oldest major version between slaves) log-bin:enabled
                Sun Jul 28 23:25:09 2024 - [info] GTID ON
                Sun Jul 28 23:25:09 2024 - [info] Replicating from 10.28.12.22(10.28.12.22:3306)


                It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on host-02(10.28.12.22:3306)? (YES/no): YES
                Sun Jul 28 23:25:23 2024 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
                Sun Jul 28 23:25:23 2024 - [info] ok.
                Sun Jul 28 23:25:23 2024 - [info] Checking MHA is not monitoring or doing failover..
                Sun Jul 28 23:25:23 2024 - [info] Checking replication health on host-01..
                Sun Jul 28 23:25:23 2024 - [info] ok.
                Sun Jul 28 23:25:23 2024 - [info] Checking replication health on host-03..
                Sun Jul 28 23:25:23 2024 - [info] ok.
                Sun Jul 28 23:25:23 2024 - [info] host-01 can be new master.
                Sun Jul 28 23:25:23 2024 - [info]
                From:
                host-02(10.28.12.22:3306) (current master)
                +--host-01(10.28.12.21:3306)
                +--host-03(10.28.12.23:3306)


                To:
                host-01(10.28.12.21:3306) (new master)
                +--host-03(10.28.12.23:3306)
                +--host-02(10.28.12.22:3306)


                Starting master switch from host-02(10.28.12.22:3306) to host-01(10.28.12.21:3306)? (yes/NO): yes
                Sun Jul 28 23:25:39 2024 - [info] Checking whether host-01(10.28.12.21:3306) is ok for the new master..
                Sun Jul 28 23:25:39 2024 - [info] ok.
                Sun Jul 28 23:25:39 2024 - [info] host-02(10.28.12.22:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
                Sun Jul 28 23:25:39 2024 - [info] host-02(10.28.12.22:3306): Resetting slave pointing to the dummy host.
                Sun Jul 28 23:25:39 2024 - [info] ** Phase 1: Configuration Check Phase completed.
                Sun Jul 28 23:25:39 2024 - [info]
                Sun Jul 28 23:25:39 2024 - [info] * Phase 2: Rejecting updates Phase..
                Sun Jul 28 23:25:39 2024 - [info]
                Sun Jul 28 23:25:39 2024 - [info] Executing master ip online change script to disable write on the current master:
                Sun Jul 28 23:25:39 2024 - [info] /home/mha/masterha/script/master_ip_online_change --command=stop --orig_master_host=host-02 --orig_master_ip=10.28.12.22 --orig_master_port=3306 --orig_master_user='monitor' --new_master_host=host-01 --new_master_ip=10.28.12.21 --new_master_port=3306 --new_master_user='monitor' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
                Sun Jul 28 23:25:39 2024 918726 Set read_only on the new master.. ok.
                Sun Jul 28 23:25:39 2024 922218 Waiting all running 3 threads are disconnected.. (max 1500 milliseconds)
                {'Time' => '3448','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
                {'Time' => '3096','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}
                {'Time' => '2780','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}
                Sun Jul 28 23:25:40 2024 428103 Waiting all running 3 threads are disconnected.. (max 1000 milliseconds)
                {'Time' => '3449','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
                {'Time' => '3097','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}
                {'Time' => '2781','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}
                Sun Jul 28 23:25:40 2024 937856 Waiting all running 3 threads are disconnected.. (max 500 milliseconds)
                {'Time' => '3449','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
                {'Time' => '3097','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}
                {'Time' => '2781','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}
                Sun Jul 28 23:25:41 2024 451314 Set read_only=1 on the orig master.. ok.
                Sun Jul 28 23:25:41 2024 451933 Waiting all running 3 queries are disconnected.. (max 500 milliseconds)
                {'Time' => '3450','db' => undef,'Id' => '7','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
                {'Time' => '3098','db' => undef,'Id' => '37','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-03.docker-chen:50486'}
                {'Time' => '2782','db' => undef,'Id' => '38','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'host-01.docker-chen:32654'}
                Disabling the VIP on old master: host-02
                Sun Jul 28 23:25:42 2024 129349 Killing all application threads..
                Sun Jul 28 23:25:42 2024 131797 done.
                Sun Jul 28 23:25:42 2024 - [info] ok.
                Sun Jul 28 23:25:42 2024 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
                Sun Jul 28 23:25:42 2024 - [info] Executing FLUSH TABLES WITH READ LOCK..
                Sun Jul 28 23:25:42 2024 - [info] ok.
                Sun Jul 28 23:25:42 2024 - [info] Orig master binlog:pos is mysql-bin.000001:342.
                Sun Jul 28 23:25:42 2024 - [info] Waiting to execute all relay logs on host-01(10.28.12.21:3306)..
                Sun Jul 28 23:25:42 2024 - [info] master_pos_wait(mysql-bin.000001:342) completed on host-01(10.28.12.21:3306). Executed 0 events.
                Sun Jul 28 23:25:42 2024 - [info] done.
                Sun Jul 28 23:25:42 2024 - [info] Getting new master's binlog name and position..
                Sun Jul 28 23:25:42 2024 - [info] mysql-bin.000002:196
                Sun Jul 28 23:25:42 2024 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host-01 or 10.28.12.21', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
                Sun Jul 28 23:25:42 2024 - [info] Executing master ip online change script to allow write on the new master:
                Sun Jul 28 23:25:42 2024 - [info] /home/mha/masterha/script/master_ip_online_change --command=start --orig_master_host=host-02 --orig_master_ip=10.28.12.22 --orig_master_port=3306 --orig_master_user='monitor' --new_master_host=host-01 --new_master_ip=10.28.12.21 --new_master_port=3306 --new_master_user='monitor' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
                Sun Jul 28 23:25:42 2024 247802 Set read_only=0 on the new master.
                Enabling the VIP - 10.28.12.99 on the new master - host-01
                Sun Jul 28 23:25:42 2024 - [info] ok.
                Sun Jul 28 23:25:42 2024 - [info]
                Sun Jul 28 23:25:42 2024 - [info] * Switching slaves in parallel..
                Sun Jul 28 23:25:42 2024 - [info]
                Sun Jul 28 23:25:42 2024 - [info] -- Slave switch on host host-03(10.28.12.23:3306) started, pid: 24678
                Sun Jul 28 23:25:42 2024 - [info]
                Sun Jul 28 23:25:43 2024 - [info] Log messages from host-03 ...
                Sun Jul 28 23:25:43 2024 - [info]
                Sun Jul 28 23:25:42 2024 - [info] Waiting to execute all relay logs on host-03(10.28.12.23:3306)..
                Sun Jul 28 23:25:42 2024 - [info] master_pos_wait(mysql-bin.000001:342) completed on host-03(10.28.12.23:3306). Executed 0 events.
                Sun Jul 28 23:25:42 2024 - [info] done.
                Sun Jul 28 23:25:42 2024 - [info] Resetting slave host-03(10.28.12.23:3306) and starting replication from the new master host-01(10.28.12.21:3306)..
                Sun Jul 28 23:25:42 2024 - [info] Executed CHANGE MASTER.
                Sun Jul 28 23:25:42 2024 - [info] Slave started.
                Sun Jul 28 23:25:43 2024 - [info] End of log messages from host-03 ...
                Sun Jul 28 23:25:43 2024 - [info]
                Sun Jul 28 23:25:43 2024 - [info] -- Slave switch on host host-03(10.28.12.23:3306) succeeded.
                Sun Jul 28 23:25:43 2024 - [info] Unlocking all tables on the orig master:
                Sun Jul 28 23:25:43 2024 - [info] Executing UNLOCK TABLES..
                Sun Jul 28 23:25:43 2024 - [info] ok.
                Sun Jul 28 23:25:43 2024 - [info] Starting orig master as a new slave..
                Sun Jul 28 23:25:43 2024 - [info] Resetting slave host-02(10.28.12.22:3306) and starting replication from the new master host-01(10.28.12.21:3306)..
                Sun Jul 28 23:25:43 2024 - [info] Executed CHANGE MASTER.
                Sun Jul 28 23:25:43 2024 - [info] Slave started.
                Sun Jul 28 23:25:43 2024 - [info] All new slave servers switched successfully.
                Sun Jul 28 23:25:43 2024 - [info]
                Sun Jul 28 23:25:43 2024 - [info] * Phase 5: New master cleanup phase..
                Sun Jul 28 23:25:43 2024 - [info]
                Sun Jul 28 23:25:43 2024 - [info] host-01: Resetting slave info succeeded.
                Sun Jul 28 23:25:43 2024 - [info] Switching master to host-01(10.28.12.21:3306) completed successfully.




                3. 恢复mha.conf配置文件,恢复MHA监控服务
                vi /home/mha/masterha/conf/mha.conf
                [server default]
                manager_log=/home/mha/masterha/log/manager.log
                manager_workdir=/home/mha/masterha/work
                master_binlog_dir=/mysql/mysql8022/binlog
                master_ip_failover_script=/home/mha/masterha/script/master_ip_failover
                master_ip_online_change_script=/home/mha/masterha/script/master_ip_online_change
                password=monitor
                ping_interval=1
                remote_workdir=/home/mha/masterha/work
                repl_password=repl
                repl_user=repl
                report_script=/home/mha/masterha/script/send_report
                secondary_check_script=/usr/bin/masterha_secondary_check -s host-02 -s host-03 --user=mha --master_host=host-01 --master_ip=10.28.12.21 --master_port=3306
                ssh_user=mha
                user=monitor
                [server1]
                hostname=host-01
                port=3306
                [server2]
                candidate_master=1
                check_repl_delay=0
                hostname=host-02
                port=3306
                [server3]
                hostname=host-03
                port=3306


                4. 启动MHA服务,继续进行监控
                sh /home/mha/masterha/script/mha_start.sh



                本文内容就到这啦,阅读完本篇,相信你也掌握了对MySQL的MHA高可用架构的搭建了吧!我们下篇再见!

                点击上方公众号,关注我吧!


                文章转载自呆呆的私房菜,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                评论