暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Mysql 一主多从 MHA 高可用架构搭建

架构工匠 2021-06-23
315



往期推荐

一、环境准备


机器名IP配置服务角色安装服务
manager192.168.12.223MHA管理控制器mha manager
master192.168.12.218数据库主服务器(master主)mha node,mysql master
slave1192.168.12.219数据库从服务器(candicate master备主)mha node;mysql slave
slave2192.168.12.220数据库从服务器(slave从)mha node;mysql slave

二、互相配置ssh免密码登录

在4台服务器上都执行以下命令,以192.168.12.223 为例,其它3台同理

[root@master-db1 ~]# echo -e "\n" |ssh-keygen  -t dsa -N "" 
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.218
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.219
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.220

三、mysql主从环境搭建

  • 创建复制用户(mysql 3个节点都执行)

    mysql> grant replication slave on *.* to 'repl'@'192.168.12.%' identified by '123456'; 
    mysql> flush privileges;
  • 主从配置

    vi /etc/my.cnf

    master 配置

    # server_id 不能重复
    server_id = 1
    gtid-mode=ON
    log_bin = master-bin  # 启动二进制日志
    log_bin_index = master-bin.index
    relay-log = slave-relay-bin
    relay-log-purge = 0  # 禁用或启用不再需要中继日志时是否自动清空它们

    slave 配置

    # server_id 不能重复
    server_id = 2
    # 全局唯一,一个事务对应一个GTID,MySQL5.6开始支持
    gtid-mode=ON
    log_bin = master-bin
    relay-log = slave-relay-bin
    relay-log-index = slave-relay-bin.index
    relay-log-purge = 0
    read_only = 1
    什么是GTID?
    1、全局唯一,一个事务对应一个GTID
    2、替代传统的binlog+pos复制;使用master_auto_position=1自动匹配GTID断点进行复制
    3、MySQL5.6开始支持
    4、在传统的主从复制中,slave端不用开启binlog;但是在GTID主从复制中,必须开启binlog
    5、slave端在接受master的binlog时,会校验GTID值
    6、为了保证主从数据的一致性,多线程同时执行一个GTID
    工作原理
    1、master更新数据时,会在事务前产生GTID,一同记录到binlog日志中。
    2、slave端的i/o 线程将变更的binlog,写入到本地的relay log中。
    3、sql线程从relay log中获取GTID,然后对比slave端的binlog是否有记录。
    4、如果有记录,说明该GTID的事务已经执行,slave会忽略。
    5、如果没有记录,slave就会从relay log中执行该GTID的事务,并记录到binlog。
    6、在解析过程中会判断是否有主键,如果没有就用二级索引,如果没有就用全部扫描
  • 在master服务器上执行

    show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000002 |      407 |             |                 |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)
  • 分别在两台服务器上执行复制相关命令

    CHANGE MASTER TO MASTER_HOST='192.168.12.218',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=407;
  • 查看slave状态

    mysql> show slave status \G;
    *************************** 1. row ***************************
                  Slave_IO_State: Waiting for master to send event
                    Master_Host: 192.168.12.218
                    Master_User: repl
                    Master_Port: 3306
                  Connect_Retry: 60
                Master_Log_File: mysql-bin.000002
            Read_Master_Log_Pos: 407
                  Relay_Log_File: relay-log.000002
                  Relay_Log_Pos: 283
          Relay_Master_Log_File: mysql-bin.000002
                Slave_IO_Running: Yes
              Slave_SQL_Running: Yes
  • 创建mha管理的账号,在所有mysql服务器上都需要执行

    mysql> grant all privileges on *.* to 'root'@'192.168.12.%' identified  by '123456';
    mysql> flush privileges;
    #如果是在slave服务器上安装的manager,则需要创建以本机hostname名连接的账号,不然masterha_check_repl测试通不过。
    #GRANT ALL PRIVILEGES ON *.* TO 'root'@'master(主机名)' IDENTIFIED BY '123456'
  • 半同步复制配置

    所有主从服务器都需要执行,mysql5.5以上才支持

    mysql>  INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
    mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

    打开mysql配置文件,添加如下:

    vi /etc/my.cnf
    #开启半同步复制
    rpl_semi_sync_master_enabled=1
    #打开半同步复制
    rpl_semi_sync_slave_enabled=on;

四、MHA安装配置

  • 所有主从节点都需要安装mha4mysql-node

    下载地址:https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58

    注意:mysql5.6 只能匹配 mha 5.6版本,因为mysql5.6之后才引入super_read_only 这个参数

    最新版的mha4mysql-0.58支持super_read_only,mysql5.7可以用

    #安装所有依赖
    yum install perl-DBD-MySQL -y
    #安装 mha node
    rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
  • 在管理节点安装mha4mysql-manager

    下载地址:https://github.com/yoshinorim/mha4mysql-manager/releases/tag/v0.58

    这个是第三方下载地址,官网需要翻墙,本文档使用的是5.6版本,所以是提前下载好到本地的。

    # 在管理节点安装依赖软件
    yum install -y perl-Config-Tiny perl-Log-Dispatch  perl-Parallel-ForkManager
    # 在管理节点安装mha4mysql-manager
    rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
  • 定义mha管理配置文档

    mkdir /etc/mha_master
    vi /etc/mha_master/mha.cnf

    配置内容:

    [server default]
    # 授权的manager监控用号和密码
    user=root
    password=123456

    # 复制用户和密码
    repl_password=123456
    repl_user=repl

    # ssh远程登录用户和端口
    #ssh_port=8822
    ssh_user=root

    # 检测主master存活间隔和类型
    ping_interval=1
    ping_type=SELECT

    # 工作目录相关
    manager_log=/data/mha/mha/app1/manager.log
    manager_workdir=/data/mha/mha/app1
    remote_workdir=/data/mha/mha/tmp
    master_binlog_dir="/usr/local/mysql/data"

    # 故障failover脚本,需在官网的基础上修改配置
    master_ip_failover_script="/data/mha/scripts/master_ip_failover"
    master_ip_online_change_script="/data/mha/scripts/master_ip_online_change"
    report_script="/data/mha/scripts/send_report"
    # shutdown_script="/data/mha/scripts/power_manager"

    # 额外路由检测主master宕机与否,ssh端口非22,需额外设置参数--port=8822
    secondary_check_script="masterha_secondary_check -s 10.34.24.157 -s 10.36.224.110"

    [server1]
    candidate_master=1
    client_bindir=/usr/local/mysql/bin/
    client_libdir=/usr/local/mysql/lib/
    hostname=10.34.24.156
    port=3306

    [server2]
    candidate_master=1
    client_bindir=/usr/local/mysql/bin/
    client_libdir=/usr/local/mysql/lib/
    hostname=10.34.24.157
    port=3306

    [server3]
    client_bindir=/usr/local/mysql/bin/
    client_libdir=/usr/local/mysql/lib/
    hostname=10.36.224.110
    port=3306

    master_ip_failover_script对应的脚本

    #!/usr/bin/env perl

    use strict;
    use warnings FATAL => 'all';

    use Getopt::Long;

    my (
    $command, $ssh_user, $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
    );

    my $vip = '192.168.142.235';
    my $ssh_start_vip = "systemctl start keepalived";
    my $ssh_stop_vip = "systemctl stop keepalived";

    GetOptions(
    'command=s' => \$command,
    'ssh_user=s' => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s' => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s' => \$new_master_host,
    'new_master_ip=s' => \$new_master_ip,
    'new_master_port=i' => \$new_master_port,
    );

    exit &main();

    sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

    my $exit_code = 1;
    eval {
    print "Disabling the VIP on old master: $orig_master_host \n";
    &stop_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn "Got Error: $@\n";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "start" ) {

    my $exit_code = 10;
    eval {
    print "Enabling the VIP - $vip on the new master - $new_master_host \n";
    &start_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn $@;
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK \n";
    #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
    exit 0;
    }
    else {
    &usage();
    exit 1;
    }
    }

    # A simple system call that enable the VIP on the new master
    sub start_vip() {
    `ssh -p 8822 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    }
    # A simple system call that disable the VIP on the old_master
    sub stop_vip() {
    return 0 unless ($ssh_user);
    `ssh -p 8822 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
    }

    sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
    }

    master_ip_online_change_script对应的脚本

    #!/usr/bin/env perl

    # Copyright (C) 2011 DeNA Co.,Ltd.
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation; either version 2 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc.,
    # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

    ## Note: This is a sample script and is not complete. Modify the script based on your environment.

    use strict;
    use warnings FATAL => 'all';

    use Getopt::Long;
    use MHA::DBHelper;
    use MHA::NodeUtil;
    use Time::HiRes qw( sleep gettimeofday tv_interval );
    use Data::Dumper;

    my $_tstart;
    my $_running_interval = 1;
    my (
    $command, $orig_master_is_new_slave, $orig_master_host,
    $orig_master_ip, $orig_master_port, $orig_master_user,
    $orig_master_password, $orig_master_ssh_user, $new_master_host,
    $new_master_ip, $new_master_port, $new_master_user,
    $new_master_password, $new_master_ssh_user,
    );
    my $vip = '192.168.142.235';
    my $ssh_start_vip = "systemctl start keepalived";
    my $ssh_stop_vip = "systemctl stop keepalived";

    GetOptions(
    'command=s' => \$command,
    'orig_master_is_new_slave' => \$orig_master_is_new_slave,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s' => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'orig_master_user=s' => \$orig_master_user,
    'orig_master_password=s' => \$orig_master_password,
    'orig_master_ssh_user=s' => \$orig_master_ssh_user,
    'new_master_host=s' => \$new_master_host,
    'new_master_ip=s' => \$new_master_ip,
    'new_master_port=i' => \$new_master_port,
    'new_master_user=s' => \$new_master_user,
    'new_master_password=s' => \$new_master_password,
    'new_master_ssh_user=s' => \$new_master_ssh_user,
    );

    exit &main();

    sub current_time_us {
    my ( $sec, $microsec ) = gettimeofday();
    my $curdate = localtime($sec);
    return $curdate . " " . sprintf( "%06d", $microsec );
    }

    sub sleep_until {
    my $elapsed = tv_interval($_tstart);
    if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
    }
    }

    sub get_threads_util {
    my $dbh = shift;
    my $my_connection_id = shift;
    my $running_time_threshold = shift;
    my $type = shift;
    $running_time_threshold = 0 unless ($running_time_threshold);
    $type = 0 unless ($type);
    my @threads;

    my $sth = $dbh->prepare("SHOW PROCESSLIST");
    $sth->execute();

    while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id = $ref->{Id};
    my $user = $ref->{User};
    my $host = $ref->{Host};
    my $command = $ref->{Command};
    my $state = $ref->{State};
    my $query_time = $ref->{Time};
    my $info = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command) && $command eq "Binlog Dump" );
    next if ( defined($user) && $user eq "system user" );
    next
    if ( defined($command)
    && $command eq "Sleep"
    && defined($query_time)
    && $query_time >= 1 );

    if ( $type >= 1 ) {
    next if ( defined($command) && $command eq "Sleep" );
    next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
    next if ( defined($info) && $info =~ m/^select/i );
    next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
    }
    return @threads;
    }

    sub main {
    if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
    ## Setting read_only=1 on the new master (to avoid accident)
    my $new_master_handler = new MHA::DBHelper();

    # args: hostname, port, user, password, raise_error(die_on_error)_or_not
    $new_master_handler->connect( $new_master_ip, $new_master_port,
    $new_master_user, $new_master_password, 1 );
    print current_time_us() . " Set read_only on the new master.. ";
    $new_master_handler->enable_read_only();
    if ( $new_master_handler->is_read_only() ) {
    print "ok.\n";
    }
    else {
    die "Failed!\n";
    }
    $new_master_handler->disconnect();

    # Connecting to the orig master, die if any database error happens
    my $orig_master_handler = new MHA::DBHelper();
    $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
    $orig_master_user, $orig_master_password, 1 );

    ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
    $orig_master_handler->disable_log_bin_local();
    print current_time_us() . " Drpping app user on the orig master..\n";
    #FIXME_xxx_drop_app_user($orig_master_handler);

    ## Waiting for N * 100 milliseconds so that current connections can exit
    my $time_until_read_only = 15;
    $_tstart = [gettimeofday];
    my @threads = get_threads_util( $orig_master_handler->{dbh},
    $orig_master_handler->{connection_id} );
    while ( $time_until_read_only > 0 && $#threads >= 0 ) {
    if ( $time_until_read_only % 5 == 0 ) {
    printf
    "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
    current_time_us(), $#threads + 1, $time_until_read_only * 100;
    if ( $#threads < 5 ) {
    print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
    foreach (@threads);
    }
    }
    sleep_until();
    $_tstart = [gettimeofday];
    $time_until_read_only--;
    @threads = get_threads_util( $orig_master_handler->{dbh},
    $orig_master_handler->{connection_id} );
    }

    ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
    print current_time_us() . " Set read_only=1 on the orig master.. ";
    $orig_master_handler->enable_read_only();
    if ( $orig_master_handler->is_read_only() ) {
    print "ok.\n";
    }
    else {
    die "Failed!\n";
    }

    ## Waiting for M * 100 milliseconds so that current update queries can complete
    my $time_until_kill_threads = 5;
    @threads = get_threads_util( $orig_master_handler->{dbh},
    $orig_master_handler->{connection_id} );
    while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
    if ( $time_until_kill_threads % 5 == 0 ) {
    printf
    "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
    current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
    if ( $#threads < 5 ) {
    print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
    foreach (@threads);
    }
    }
    sleep_until();
    $_tstart = [gettimeofday];
    $time_until_kill_threads--;
    @threads = get_threads_util( $orig_master_handler->{dbh},
    $orig_master_handler->{connection_id} );
    }
    print "disable the VIP on old master: $orig_master_host \n";
    &stop_vip();
    ## Terminating all threads
    print current_time_us() . " Killing all application threads..\n";
    $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
    print current_time_us() . " done.\n";
    $orig_master_handler->enable_log_bin_local();
    $orig_master_handler->disconnect();

    ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
    $exit_code = 0;
    };
    if ($@) {
    warn "Got Error: $@\n";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

    # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
    # If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
    my $new_master_handler = new MHA::DBHelper();

    # args: hostname, port, user, password, raise_error_or_not
    $new_master_handler->connect( $new_master_ip, $new_master_port,
    $new_master_user, $new_master_password, 1 );

    ## Set read_only=0 on the new master
    $new_master_handler->disable_log_bin_local();
    print current_time_us() . " Set read_only=0 on the new master.\n";
    $new_master_handler->disable_read_only();

    ## Creating an app user on the new master
    print current_time_us() . " Creating app user on the new master..\n";
    #FIXME_xxx_create_app_user($new_master_handler);
    $new_master_handler->enable_log_bin_local();
    $new_master_handler->disconnect();

    ## Update master ip on the catalog database, etc
    print "enable the VIP: $vip on the new master: $new_master_host \n ";
    &start_vip();
    $exit_code = 0;
    };
    if ($@) {
    warn "Got Error: $@\n";
    exit $exit_code;
    }
    exit $exit_code;
    }
    elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
    }
    else {
    &usage();
    exit 1;
    }
    }

    sub start_vip() {
    `ssh -p 8822 $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    }
    sub stop_vip() {
    `ssh -p 8822 $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
    }

    sub usage {
    print
    "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
    die;
    }

    report_script对应的脚本

    #!/usr/bin/perl

    # Copyright (C) 2011 DeNA Co.,Ltd.
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation; either version 2 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc.,
    # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

    ## Note: This is a sample script and is not complete. Modify the script based on your environment.

    use strict;
    use warnings FATAL => 'all';
    use Mail::Sender;
    use Getopt::Long;

    #new_master_host and new_slave_hosts are set only when recovering master succeeded
    my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
    my $smtp='smtp.126.com';
    my $mail_from='xxxx@126.com';
    my $mail_user='xxxx@126.com';
    my $mail_pass='password 不是邮箱密码 是客户端授权码';
    my $mail_to=['xxxx@126.com','xxxx@corp.netease.com'];
    GetOptions(
    'orig_master_host=s' => \$dead_master_host,
    'new_master_host=s' => \$new_master_host,
    'new_slave_hosts=s' => \$new_slave_hosts,
    'subject=s' => \$subject,
    'body=s' => \$body,
    );

    mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

    sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
    or die "Can't open the debug file:$!\n";
    my $sender = new Mail::Sender {
    ctype => 'text/plain; charset=utf-8',
    encoding => 'utf-8',
    smtp => $smtp,
    from => $mail_from,
    auth => 'LOGIN',
    TLS_allowed => '0',
    authid => $user,
    authpwd => $passwd,
    to => $mail_to,
    subject => $subject,
    debug => $DEBUG
    };

    $sender->MailMsg(
    { msg => $msg,
    debug => $DEBUG
    }
    ) or print $Mail::Sender::Error;
    return 1;
    }

    # Do whatever you want here

    exit 0;
  • 管理节点测试ssh连通性

    masterha_check_ssh -conf=/etc/mha_manager/mha.cnf
    Fri May 25 14:24:34 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Fri May 25 14:24:34 2018 - [info] Reading application default configuration from /data/mha/app1.cnf..
    Fri May 25 14:24:34 2018 - [info] Reading server configuration from /data/mha/app1.cnf..
    Fri May 25 14:24:34 2018 - [info] Starting SSH connection tests..
    Fri May 25 14:24:37 2018 - [debug]
    Fri May 25 14:24:34 2018 - [debug] Connecting via SSH from
    ....
    Fri May 25 14:24:39 2018 - [debug] ok.
    Fri May 25 14:24:39 2018 - [info] All SSH connection tests passed successfully.


  • 管理节点测试集群中的主从复制

    masterha_check_repl --conf=/etc/mha_manager/mha.cnf
    Fri May 29 10:38:48 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Fri May 29 10:38:48 2020 - [info] Reading application default configuration from /etc/mha_manager/mha.cnf..
    Fri May 29 10:38:48 2020 - [info] Reading server configuration from /etc/mha_manager/mha.cnf..
    Fri May 29 10:38:48 2020 - [info] MHA::MasterMonitor version 0.56.
    Fri May 29 10:38:50 2020 - [warning] SQL Thread is stopped(no error) on 10.34.24.156(10.34.24.156:3306)
    Fri May 29 10:38:50 2020 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.34.24.156(10.34.24.156:3306)
    Fri May 29 10:38:50 2020 - [info] Master configurations are as below:
    Master 10.34.24.156(10.34.24.156:3306), replicating from 10.34.24.157(10.34.24.157:3306)
    Master 10.34.24.157(10.34.24.157:3306), replicating from 10.34.24.156(10.34.24.156:3306), read-only
    .......
    .......
    Checking the Status of the script.. OK
    Fri May 29 10:38:54 2020 - [info] OK.
    Fri May 29 10:38:54 2020 - [warning] shutdown_script is not defined.
    Fri May 29 10:38:54 2020 - [info] Got exit code 0 (Not master dead).

    MySQL Replication Health is OK.

    中途可能执行脚本时会报没有权限

    Can't exec "/data/mha/scripts/master_ip_failover": Permission denied

    执行:chmod +x /data/mha/scripts/master_ip_failover

  • 启动MHA管理节点

    启动命令

    nohup masterha_manager --conf=/etc/mha_manager/mha.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &
    #参数含义具体可参照官网说明 https://github.com/yoshinorim/mha4mysql-manager/wiki/masterha_manager
    #此处简单说明
    #--remove_dead_master_conf,发生failover后,MHA会自动从配置文件里移除dead master的相关信息
    #--ignore_last_failover,默认情况下,之前如果存在failover,那么再次启动MHA是不成功的,必须删除对应目录下的failover error文件, (manager_workdir)/(app_name).failover.error;设置此参数,就可以忽略上次是否错误而可以继续进行failover

    关闭命令

    masterha_stop --conf=/etc/mha_manager/mha.cnf

    查看状态

    masterha_check_status --conf=/etc/mha_manager/mha.cnf
    # mha (pid:45782) is running(0:PING_OK), master:10.34.24.156
  • 工具命令

    masterha_check_ssh :   检查MHA的SSH配置。
    masterha_check_repl :   检查MySQL复制。
    masterha_manager :     启动MHA。
    masterha_check_status : 检测当前MHA运行状态。
    masterha_master_monitor : 监测master是否宕机。
    masterha_master_switch :  控制故障转移(自动或手动)。
    masterha_conf_host :       添加或删除配置的server信息。

五、测试验证

  • 关闭主节点的mysql

    service mysqld stop
    # Shutting down MySQL... [ OK ]
  • 查看管理节点的日志,可以看到master自动切换到24.157上面了

    Fri May 29 10:50:07 2020 - [info]  All relay logs were successfully applied.
    Fri May 29 10:50:07 2020 - [info] Resetting slave 10.36.224.110(10.36.224.110:3306) and starting replication from the new master 10.34.24.157(10.34.24.157:3306)..
    Fri May 29 10:50:07 2020 - [info] Executed CHANGE MASTER.
    Fri May 29 10:50:07 2020 - [info] Slave started.
    Fri May 29 10:50:08 2020 - [info] End of log messages from 10.36.224.110.
    Fri May 29 10:50:08 2020 - [info] -- Slave recovery on host 10.36.224.110(10.36.224.110:3306) succeeded.
    Fri May 29 10:50:08 2020 - [info] All new slave servers recovered successfully.
    Fri May 29 10:50:08 2020 - [info]
    Fri May 29 10:50:08 2020 - [info] * Phase 5: New master cleanup phase..
    Fri May 29 10:50:08 2020 - [info]
    Fri May 29 10:50:08 2020 - [info] Resetting slave info on the new master..
    Fri May 29 10:50:08 2020 - [info] 10.34.24.157: Resetting slave info succeeded.
    Fri May 29 10:50:08 2020 - [info] Master failover to 10.34.24.157(10.34.24.157:3306) completed successfully.
    Fri May 29 10:50:08 2020 - [info] Deleted server1 entry from /etc/mha_manager/mha.cnf .
    Fri May 29 10:50:08 2020 - [info]

    ----- Failover Report -----

    mha: MySQL Master failover 10.34.24.156(10.34.24.156:3306) to 10.34.24.157(10.34.24.157:3306) succeeded

    Master 10.34.24.156(10.34.24.156:3306) is down!

    Check MHA Manager logs at localhost:/data/mha/mha/app1/manager.log for details.

    Started automated(non-interactive) failover.
    Invalidated master IP address on 10.34.24.156(10.34.24.156:3306)
    The latest slave 10.34.24.157(10.34.24.157:3306) has all relay logs for recovery.
    Selected 10.34.24.157(10.34.24.157:3306) as a new master.
    10.34.24.157(10.34.24.157:3306): OK: Applying all logs succeeded.
    10.34.24.157(10.34.24.157:3306): OK: Activated master IP address.
    10.36.224.110(10.36.224.110:3306): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    10.36.224.110(10.36.224.110:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.34.24.157(10.34.24.157:3306)
    10.34.24.157(10.34.24.157:3306): Resetting slave info succeeded.
    Master failover to 10.34.24.157(10.34.24.157:3306) completed successfully.
  • 到24.157上面查询当前数据库状态

    mysql> show slave status \G;

    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 10.34.24.157
    Master_User: repl
    Master_Port: 3306
    Connect_Retry: 60
    Master_Log_File: master-log.000003
    Read_Master_Log_Pos: 120
    Relay_Log_File: relay-log.000002
    Relay_Log_Pos: 284
    Relay_Master_Log_File: master-log.000003
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
  • 查看切换后的mysql主节点只读状态,发现被自动关闭了,说明之前的slave现在可写了

    mysql> show variables like 'read_only';
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | read_only     | OFF   |
    +---------------+-------+
    1 row in set (0.01 sec)

六、优势

  • 故障切换迅速

    在M/S复制集群中,只要从库在复制上没有延迟,MHA通常可以在数秒内实现故障切换。9-10秒内检查到master故障,可以选择在7-10秒关闭master以避免出现裂脑,几秒钟内,将差异中继日志(relay log)应用到新的master上,因此总的宕机时间通常为10-30秒。恢复新的master后,MHA并行的恢复其余的slave。即使在有数万台slave,也不会影响master的恢复时间。

  • master故障不会导致数据不一致

    当目前的master出现故障是,MHA自动识别slave之间中继日志(relay log)的不同,并应用到所有的slave中。这样所有的salve能够保持同步,只要所有的slave处于存活状态。和Semi-SynchronousReplication(半同步插件)一起使用,(几乎)可以保证没有数据丢失。

  • 无需修改当前的MySQL设置

    MHA的设计的重要原则之一就是尽可能地简单易用。MHA工作在传统的MySQL版本5.0和之后版本的主从复制环境中。和其它高可用解决方法比,MHA并不需要改变MySQL的部署环境。MHA适用于异步和半同步的主从复制。

    启动/停止/升级/降级/安装/卸载MHA不需要改变(包扩启动/停止)MySQL复制。当需要升级MHA到新的版本,不需要停止MySQL,仅仅替换到新版本的MHA,然后重启MHA Manager就好了。

    MHA运行在MySQL5.0开始的原生版本上。一些其它的MySQL高可用解决方案需要特定的版本(比如MySQL集群、带全局事务ID的MySQL等等),但并不仅仅为了master的高可用才迁移应用的。在大多数情况下,已经部署了比较旧MySQL应用,并且不想仅仅为了实现Master的高可用,花太多的时间迁移到不同的存储引擎或更新的前沿发行版。MHA工作的包括5.0/5.1/5.5的原生版本的MySQL上,所以并不需要迁移。

  • 无需增加大量的服务器

    MHA由MHA Manager和MHANode组成。MHA Node运行在需要故障切换/恢复的MySQL服务器上,因此并不需要额外增加服务器。MHAManager运行在特定的服务器上,因此需要增加一台(实现高可用需要2台),但是MHAManager可以监控大量(甚至上百台)单独的master,因此,并不需要增加大量的服务器。即使在一台slave上运行MHA Manager也是可以的。综上,实现MHA并没用额外增加大量的服务。

  • 无性能下降

    MHA适用与异步或半同步的MySQL复制。监控master时,MHA仅仅是每隔几秒(默认是3秒)发送一个ping包,并不发送重查询。可以得到像原生MySQL复制一样快的性能。

  • 适用于任何存储引擎

    MHA可以运行在只要MySQL复制运行的存储引擎上,并不仅限制于InnoDB,即使在不易迁移的传统的MyISAM引擎环境,一样可以使用MHA。

  • MHA 0.56版本已经支持MYSQL GTID复制

公众号:架构工匠(ID: gh_e3ddf5fb9980

CSDN博客:架构工匠


长按二维码关注

愿一个热爱技术的灵魂,给你带来更多共鸣与激情碰撞。

感谢您的阅读!常来哦


文章转载自架构工匠,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论