往期推荐

| 机器名 | IP配置 | 服务角色 | 安装服务 |
|---|---|---|---|
| manager | 192.168.12.223 | MHA管理控制器 | mha manager |
| master | 192.168.12.218 | 数据库主服务器(master主) | mha node,mysql master |
| slave1 | 192.168.12.219 | 数据库从服务器(candicate master备主) | mha node;mysql slave |
| slave2 | 192.168.12.220 | 数据库从服务器(slave从) | mha node;mysql slave |
二、互相配置ssh免密码登录
在4台服务器上都执行以下命令,以192.168.12.223 为例,其它3台同理
[root@master-db1 ~]# echo -e "\n" |ssh-keygen -t dsa -N ""
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.218
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.219
[root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.12.220
三、mysql主从环境搭建
创建复制用户(mysql 3个节点都执行)
mysql> grant replication slave on *.* to 'repl'@'192.168.12.%' identified by '123456';
mysql> flush privileges;主从配置
vi /etc/my.cnf
master 配置
# server_id 不能重复
server_id = 1
gtid-mode=ON
log_bin = master-bin # 启动二进制日志
log_bin_index = master-bin.index
relay-log = slave-relay-bin
relay-log-purge = 0 # 禁用或启用不再需要中继日志时是否自动清空它们slave 配置
# server_id 不能重复
server_id = 2
# 全局唯一,一个事务对应一个GTID,MySQL5.6开始支持
gtid-mode=ON
log_bin = master-bin
relay-log = slave-relay-bin
relay-log-index = slave-relay-bin.index
relay-log-purge = 0
read_only = 1什么是GTID?
1、全局唯一,一个事务对应一个GTID
2、替代传统的binlog+pos复制;使用master_auto_position=1自动匹配GTID断点进行复制
3、MySQL5.6开始支持
4、在传统的主从复制中,slave端不用开启binlog;但是在GTID主从复制中,必须开启binlog
5、slave端在接受master的binlog时,会校验GTID值
6、为了保证主从数据的一致性,多线程同时执行一个GTID
工作原理
1、master更新数据时,会在事务前产生GTID,一同记录到binlog日志中。
2、slave端的i/o 线程将变更的binlog,写入到本地的relay log中。
3、sql线程从relay log中获取GTID,然后对比slave端的binlog是否有记录。
4、如果有记录,说明该GTID的事务已经执行,slave会忽略。
5、如果没有记录,slave就会从relay log中执行该GTID的事务,并记录到binlog。
6、在解析过程中会判断是否有主键,如果没有就用二级索引,如果没有就用全部扫描在master服务器上执行
show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 407 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)分别在两台服务器上执行复制相关命令
CHANGE MASTER TO MASTER_HOST='192.168.12.218',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=407;查看slave状态
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.12.218
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 407
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 283
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes创建mha管理的账号,在所有mysql服务器上都需要执行
mysql> grant all privileges on *.* to 'root'@'192.168.12.%' identified by '123456';
mysql> flush privileges;
#如果是在slave服务器上安装的manager,则需要创建以本机hostname名连接的账号,不然masterha_check_repl测试通不过。
#GRANT ALL PRIVILEGES ON *.* TO 'root'@'master(主机名)' IDENTIFIED BY '123456'半同步复制配置
所有主从服务器都需要执行,mysql5.5以上才支持
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';打开mysql配置文件,添加如下:
vi /etc/my.cnf
#开启半同步复制
rpl_semi_sync_master_enabled=1
#打开半同步复制
rpl_semi_sync_slave_enabled=on;
四、MHA安装配置
所有主从节点都需要安装mha4mysql-node
下载地址:https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
注意:mysql5.6 只能匹配 mha 5.6版本,因为mysql5.6之后才引入super_read_only 这个参数
最新版的mha4mysql-0.58支持super_read_only,mysql5.7可以用
#安装所有依赖
yum install perl-DBD-MySQL -y
#安装 mha node
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm在管理节点安装mha4mysql-manager
下载地址:https://github.com/yoshinorim/mha4mysql-manager/releases/tag/v0.58
这个是第三方下载地址,官网需要翻墙,本文档使用的是5.6版本,所以是提前下载好到本地的。
# 在管理节点安装依赖软件
yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
# 在管理节点安装mha4mysql-manager
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm定义mha管理配置文档
mkdir /etc/mha_master
vi /etc/mha_master/mha.cnf配置内容:
[server default]
# 授权的manager监控用号和密码
user=root
password=123456
# 复制用户和密码
repl_password=123456
repl_user=repl
# ssh远程登录用户和端口
#ssh_port=8822
ssh_user=root
# 检测主master存活间隔和类型
ping_interval=1
ping_type=SELECT
# 工作目录相关
manager_log=/data/mha/mha/app1/manager.log
manager_workdir=/data/mha/mha/app1
remote_workdir=/data/mha/mha/tmp
master_binlog_dir="/usr/local/mysql/data"
# 故障failover脚本,需在官网的基础上修改配置
master_ip_failover_script="/data/mha/scripts/master_ip_failover"
master_ip_online_change_script="/data/mha/scripts/master_ip_online_change"
report_script="/data/mha/scripts/send_report"
# shutdown_script="/data/mha/scripts/power_manager"
# 额外路由检测主master宕机与否,ssh端口非22,需额外设置参数--port=8822
secondary_check_script="masterha_secondary_check -s 10.34.24.157 -s 10.36.224.110"
[server1]
candidate_master=1
client_bindir=/usr/local/mysql/bin/
client_libdir=/usr/local/mysql/lib/
hostname=10.34.24.156
port=3306
[server2]
candidate_master=1
client_bindir=/usr/local/mysql/bin/
client_libdir=/usr/local/mysql/lib/
hostname=10.34.24.157
port=3306
[server3]
client_bindir=/usr/local/mysql/bin/
client_libdir=/usr/local/mysql/lib/
hostname=10.36.224.110
port=3306master_ip_failover_script对应的脚本
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.142.235';
my $ssh_start_vip = "systemctl start keepalived";
my $ssh_stop_vip = "systemctl stop keepalived";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh -p 8822 $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh -p 8822 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}master_ip_online_change_script对应的脚本
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 1;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user,
);
my $vip = '192.168.142.235';
my $ssh_start_vip = "systemctl start keepalived";
my $ssh_stop_vip = "systemctl stop keepalived";
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
);
exit &main();
sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}
sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );
if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}
if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}
push @threads, $ref;
}
return @threads;
}
sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
$orig_master_handler->disable_log_bin_local();
print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
print "disable the VIP on old master: $orig_master_host \n";
&stop_vip();
## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
## Creating an app user on the new master
print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
## Update master ip on the catalog database, etc
print "enable the VIP: $vip on the new master: $new_master_host \n ";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh -p 8822 $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
`ssh -p 8822 $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}report_script对应的脚本
#!/usr/bin/perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.126.com';
my $mail_from='xxxx@126.com';
my $mail_user='xxxx@126.com';
my $mail_pass='password 不是邮箱密码 是客户端授权码';
my $mail_to=['xxxx@126.com','xxxx@corp.netease.com'];
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, "> /tmp/monitormail.log"
or die "Can't open the debug file:$!\n";
my $sender = new Mail::Sender {
ctype => 'text/plain; charset=utf-8',
encoding => 'utf-8',
smtp => $smtp,
from => $mail_from,
auth => 'LOGIN',
TLS_allowed => '0',
authid => $user,
authpwd => $passwd,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg(
{ msg => $msg,
debug => $DEBUG
}
) or print $Mail::Sender::Error;
return 1;
}
# Do whatever you want here
exit 0;管理节点测试ssh连通性
masterha_check_ssh -conf=/etc/mha_manager/mha.cnf
Fri May 25 14:24:34 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 25 14:24:34 2018 - [info] Reading application default configuration from /data/mha/app1.cnf..
Fri May 25 14:24:34 2018 - [info] Reading server configuration from /data/mha/app1.cnf..
Fri May 25 14:24:34 2018 - [info] Starting SSH connection tests..
Fri May 25 14:24:37 2018 - [debug]
Fri May 25 14:24:34 2018 - [debug] Connecting via SSH from
....
Fri May 25 14:24:39 2018 - [debug] ok.
Fri May 25 14:24:39 2018 - [info] All SSH connection tests passed successfully.管理节点测试集群中的主从复制
masterha_check_repl --conf=/etc/mha_manager/mha.cnf
Fri May 29 10:38:48 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 29 10:38:48 2020 - [info] Reading application default configuration from /etc/mha_manager/mha.cnf..
Fri May 29 10:38:48 2020 - [info] Reading server configuration from /etc/mha_manager/mha.cnf..
Fri May 29 10:38:48 2020 - [info] MHA::MasterMonitor version 0.56.
Fri May 29 10:38:50 2020 - [warning] SQL Thread is stopped(no error) on 10.34.24.156(10.34.24.156:3306)
Fri May 29 10:38:50 2020 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.34.24.156(10.34.24.156:3306)
Fri May 29 10:38:50 2020 - [info] Master configurations are as below:
Master 10.34.24.156(10.34.24.156:3306), replicating from 10.34.24.157(10.34.24.157:3306)
Master 10.34.24.157(10.34.24.157:3306), replicating from 10.34.24.156(10.34.24.156:3306), read-only
.......
.......
Checking the Status of the script.. OK
Fri May 29 10:38:54 2020 - [info] OK.
Fri May 29 10:38:54 2020 - [warning] shutdown_script is not defined.
Fri May 29 10:38:54 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.中途可能执行脚本时会报没有权限
Can't exec "/data/mha/scripts/master_ip_failover": Permission denied
执行:chmod +x /data/mha/scripts/master_ip_failover
启动MHA管理节点
启动命令
nohup masterha_manager --conf=/etc/mha_manager/mha.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &
#参数含义具体可参照官网说明 https://github.com/yoshinorim/mha4mysql-manager/wiki/masterha_manager
#此处简单说明
#--remove_dead_master_conf,发生failover后,MHA会自动从配置文件里移除dead master的相关信息
#--ignore_last_failover,默认情况下,之前如果存在failover,那么再次启动MHA是不成功的,必须删除对应目录下的failover error文件, (manager_workdir)/(app_name).failover.error;设置此参数,就可以忽略上次是否错误而可以继续进行failover关闭命令
masterha_stop --conf=/etc/mha_manager/mha.cnf
查看状态
masterha_check_status --conf=/etc/mha_manager/mha.cnf
# mha (pid:45782) is running(0:PING_OK), master:10.34.24.156工具命令
masterha_check_ssh : 检查MHA的SSH配置。
masterha_check_repl : 检查MySQL复制。
masterha_manager : 启动MHA。
masterha_check_status : 检测当前MHA运行状态。
masterha_master_monitor : 监测master是否宕机。
masterha_master_switch : 控制故障转移(自动或手动)。
masterha_conf_host : 添加或删除配置的server信息。
五、测试验证
关闭主节点的mysql
service mysqld stop
# Shutting down MySQL... [ OK ]查看管理节点的日志,可以看到master自动切换到24.157上面了
Fri May 29 10:50:07 2020 - [info] All relay logs were successfully applied.
Fri May 29 10:50:07 2020 - [info] Resetting slave 10.36.224.110(10.36.224.110:3306) and starting replication from the new master 10.34.24.157(10.34.24.157:3306)..
Fri May 29 10:50:07 2020 - [info] Executed CHANGE MASTER.
Fri May 29 10:50:07 2020 - [info] Slave started.
Fri May 29 10:50:08 2020 - [info] End of log messages from 10.36.224.110.
Fri May 29 10:50:08 2020 - [info] -- Slave recovery on host 10.36.224.110(10.36.224.110:3306) succeeded.
Fri May 29 10:50:08 2020 - [info] All new slave servers recovered successfully.
Fri May 29 10:50:08 2020 - [info]
Fri May 29 10:50:08 2020 - [info] * Phase 5: New master cleanup phase..
Fri May 29 10:50:08 2020 - [info]
Fri May 29 10:50:08 2020 - [info] Resetting slave info on the new master..
Fri May 29 10:50:08 2020 - [info] 10.34.24.157: Resetting slave info succeeded.
Fri May 29 10:50:08 2020 - [info] Master failover to 10.34.24.157(10.34.24.157:3306) completed successfully.
Fri May 29 10:50:08 2020 - [info] Deleted server1 entry from /etc/mha_manager/mha.cnf .
Fri May 29 10:50:08 2020 - [info]
----- Failover Report -----
mha: MySQL Master failover 10.34.24.156(10.34.24.156:3306) to 10.34.24.157(10.34.24.157:3306) succeeded
Master 10.34.24.156(10.34.24.156:3306) is down!
Check MHA Manager logs at localhost:/data/mha/mha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 10.34.24.156(10.34.24.156:3306)
The latest slave 10.34.24.157(10.34.24.157:3306) has all relay logs for recovery.
Selected 10.34.24.157(10.34.24.157:3306) as a new master.
10.34.24.157(10.34.24.157:3306): OK: Applying all logs succeeded.
10.34.24.157(10.34.24.157:3306): OK: Activated master IP address.
10.36.224.110(10.36.224.110:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.36.224.110(10.36.224.110:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.34.24.157(10.34.24.157:3306)
10.34.24.157(10.34.24.157:3306): Resetting slave info succeeded.
Master failover to 10.34.24.157(10.34.24.157:3306) completed successfully.到24.157上面查询当前数据库状态
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.34.24.157
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-log.000003
Read_Master_Log_Pos: 120
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 284
Relay_Master_Log_File: master-log.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes查看切换后的mysql主节点只读状态,发现被自动关闭了,说明之前的slave现在可写了
mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.01 sec)
六、优势
故障切换迅速
在M/S复制集群中,只要从库在复制上没有延迟,MHA通常可以在数秒内实现故障切换。9-10秒内检查到master故障,可以选择在7-10秒关闭master以避免出现裂脑,几秒钟内,将差异中继日志(relay log)应用到新的master上,因此总的宕机时间通常为10-30秒。恢复新的master后,MHA并行的恢复其余的slave。即使在有数万台slave,也不会影响master的恢复时间。
master故障不会导致数据不一致
当目前的master出现故障是,MHA自动识别slave之间中继日志(relay log)的不同,并应用到所有的slave中。这样所有的salve能够保持同步,只要所有的slave处于存活状态。和Semi-SynchronousReplication(半同步插件)一起使用,(几乎)可以保证没有数据丢失。
无需修改当前的MySQL设置
MHA的设计的重要原则之一就是尽可能地简单易用。MHA工作在传统的MySQL版本5.0和之后版本的主从复制环境中。和其它高可用解决方法比,MHA并不需要改变MySQL的部署环境。MHA适用于异步和半同步的主从复制。
启动/停止/升级/降级/安装/卸载MHA不需要改变(包扩启动/停止)MySQL复制。当需要升级MHA到新的版本,不需要停止MySQL,仅仅替换到新版本的MHA,然后重启MHA Manager就好了。
MHA运行在MySQL5.0开始的原生版本上。一些其它的MySQL高可用解决方案需要特定的版本(比如MySQL集群、带全局事务ID的MySQL等等),但并不仅仅为了master的高可用才迁移应用的。在大多数情况下,已经部署了比较旧MySQL应用,并且不想仅仅为了实现Master的高可用,花太多的时间迁移到不同的存储引擎或更新的前沿发行版。MHA工作的包括5.0/5.1/5.5的原生版本的MySQL上,所以并不需要迁移。
无需增加大量的服务器
MHA由MHA Manager和MHANode组成。MHA Node运行在需要故障切换/恢复的MySQL服务器上,因此并不需要额外增加服务器。MHAManager运行在特定的服务器上,因此需要增加一台(实现高可用需要2台),但是MHAManager可以监控大量(甚至上百台)单独的master,因此,并不需要增加大量的服务器。即使在一台slave上运行MHA Manager也是可以的。综上,实现MHA并没用额外增加大量的服务。
无性能下降
MHA适用与异步或半同步的MySQL复制。监控master时,MHA仅仅是每隔几秒(默认是3秒)发送一个ping包,并不发送重查询。可以得到像原生MySQL复制一样快的性能。
适用于任何存储引擎
MHA可以运行在只要MySQL复制运行的存储引擎上,并不仅限制于InnoDB,即使在不易迁移的传统的MyISAM引擎环境,一样可以使用MHA。
MHA 0.56版本已经支持MYSQL GTID复制


公众号:架构工匠(ID: gh_e3ddf5fb9980)
CSDN博客:架构工匠
长按二维码关注
愿一个热爱技术的灵魂,给你带来更多共鸣与激情碰撞。

感谢您的阅读!常来哦




