mysql 高可用

原创小AI 2023-08-16

163

orchestrator+vip方式实现MySQL高可用

系统：CentOS 7.6
服务器

192.168.20.101 node1

192.168.20.102 node2

192.168.20.103 node3

数据库版本： MySQL 5.7.27 均已安装

GTID 已开启

log-slave-updates = ON

report_host最好也配置上

orchestrator 版本： v3.1.4

MySQL主从配置略...，需注意设置主从时添加参数如下，优化发现切换时间，另外需要注意写个脚本更改切换后的 rpl_semi_sync_slave_enabled/rpl_semi_sync_master_enabled/read_only/super_read_only 参数,若rpl_semi_sync_source_timeout设置值小的话问题不大

change master to

master_host='192.168.20.101',

master_port=3306,

master_user='rpl',

master_password='123456',

master_auto_position=1,

MASTER_HEARTBEAT_PERIOD=2,

MASTER_CONNECT_RETRY=1,

MASTER_RETRY_COUNT=86400;

set global slave_net_timeout=8;

set global read_only=1;

set global super_read_only=1;

1-基础配置

1.1-hosts配置

每台机器分别添加hosts

echo '192.168.20.101 node1' >> /etc/hosts

echo '192.168.20.102 node2' >> /etc/hosts

echo '192.168.20.103 node3' >> /etc/hosts

1.2-配置ssh免密

所有机器执行:

# 一路回车

ssh-keygen

ssh-copy-id node1

ssh-copy-id node2

ssh-copy-id node3

2-部署 orchestrator

2.1-下载

下载路径：[openark/orchestrator](openark/orchestrator)

wget https://github.com/openark/orchestrator/releases/download/v3.1.4/orchestrator-3.1.4-1.x86_64.rpm

wget https://github.com/openark/orchestrator/releases/download/v3.1.4/orchestrator-client-3.1.4-1.x86_64.rpm

2.2-安装

三台机器都安装 orchestrator的服务端和客户端

# 安装依赖

yum -y install jq

# 安装

[root@node1 ~]# rpm -ivh orchestrator-3.1.4-1.x86_64.rpm

Preparing... ################################# [100%]

Updating / installing...

1:orchestrator-1:3.1.4-1 ################################# [100%]

[root@node1 ~]# rpm -ivh orchestrator-client-3.1.4-1.x86_64.rpm

Preparing... ################################# [100%]

Updating / installing...

1:orchestrator-client-1:3.1.4-1 ################################# [100%]

2.3-创建orchestrator管理用户

# mysql 主库上操作

mysql> CREATE USER 'orchestrator'@'192.168.20.%' IDENTIFIED BY '123456';

mysql> GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'192.168.20.%';

mysql> GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'192.168.20.%';

2.4-设置配置文件

cp /usr/local/orchestrator/orchestrator-sample-sqlite.conf.json /etc/orchestrator.conf.json

vi /etc/orchestrator.conf.json 修改如下几个参数：

MySQLTopologyUser/MySQLTopologyPassword 检测MySQL集群的用户名/密码
SQLite3DataFile SQLite库存放路径，需要有写的权限
DefaultInstancePort mysql 实例端口
FailureDetectionPeriodBlockMinutes 在该时间内再次出现故障，不会被多次发现。
coveryPeriodBlockSeconds 在该时间内再次出现故障，不会进行迁移，避免出现并发恢复和不稳定。
RecoverMasterClusterFilters 只对匹配这些正则表达式模式的集群进行主恢复（“*”模式匹配所有）。
RecoverIntermediateMasterClusterFilters 只对匹配这些正则表达式模式的集群进行主恢复（“*”模式匹配所有）。
PostFailoverProcesses 修改为脚本实际的存放路径
添加Raft相关 Raft相关用于做orchestrator高可用
RaftBind 修改为本机IP

{

"Debug": true,

"EnableSyslog": false,

"ListenAddress": ":3000",

"MySQLTopologyUser": "orchestrator",

"MySQLTopologyPassword": "123456",

"MySQLTopologyCredentialsConfigFile": "",

"MySQLTopologySSLPrivateKeyFile": "",

"MySQLTopologySSLCertFile": "",

"MySQLTopologySSLCAFile": "",

"MySQLTopologySSLSkipVerify": true,

"MySQLTopologyUseMutualTLS": false,

"BackendDB": "sqlite",

"SQLite3DataFile": "/usr/local/orchestrator/orchestrator.db",

"MySQLConnectTimeoutSeconds": 1,

"DefaultInstancePort": 3306,

"DiscoverByShowSlaveHosts": true,

"InstancePollSeconds": 5,

"DiscoveryIgnoreReplicaHostnameFilters": [

"a_host_i_want_to_ignore[.]example[.]com",

".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",

"a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"

"UnseenInstanceForgetHours": 240,

"SnapshotTopologiesIntervalHours": 0,

"InstanceBulkOperationsWaitTimeoutSeconds": 10,

"HostnameResolveMethod": "default",

"MySQLHostnameResolveMethod": "@@hostname",

"SkipBinlogServerUnresolveCheck": true,

"ExpiryHostnameResolvesMinutes": 60,

"RejectHostnameResolvePattern": "",

"ReasonableReplicationLagSeconds": 10,

"ProblemIgnoreHostnameFilters": [],

"VerifyReplicationFilters": false,

"ReasonableMaintenanceReplicationLagSeconds": 20,

"CandidateInstanceExpireMinutes": 60,

"AuditLogFile": "",

"AuditToSyslog": false,

"RemoveTextFromHostnameDisplay": ".mydomain.com:3306",

"ReadOnly": false,

"AuthenticationMethod": "",

"HTTPAuthUser": "",

"HTTPAuthPassword": "",

"AuthUserHeader": "",

"PowerAuthUsers": [

"*"

"ClusterNameToAlias": {

"127.0.0.1": "test suite"

"ReplicationLagQuery": "",

"DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",

"DetectClusterDomainQuery": "",

"DetectInstanceAliasQuery": "",

"DetectPromotionRuleQuery": "",

"DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",

"PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",

"PromotionIgnoreHostnameFilters": [],

"DetectSemiSyncEnforcedQuery": "",

"ServeAgentsHttp": false,

"AgentsServerPort": ":3001",

"AgentsUseSSL": false,

"AgentsUseMutualTLS": false,

"AgentSSLSkipVerify": false,

"AgentSSLPrivateKeyFile": "",

"AgentSSLCertFile": "",

"AgentSSLCAFile": "",

"AgentSSLValidOUs": [],

"UseSSL": false,

"UseMutualTLS": false,

"SSLSkipVerify": false,

"SSLPrivateKeyFile": "",

"SSLCertFile": "",

"SSLCAFile": "",

"SSLValidOUs": [],

"URLPrefix": "",

"StatusEndpoint": "/api/status",

"StatusSimpleHealth": true,

"StatusOUVerify": false,

"AgentPollMinutes": 60,

"UnseenAgentForgetHours": 6,

"StaleSeedFailMinutes": 60,

"SeedAcceptableBytesDiff": 8192,

"PseudoGTIDPattern": "",

"PseudoGTIDPatternIsFixedSubstring": false,

"PseudoGTIDMonotonicHint": "asc:",

"DetectPseudoGTIDQuery": "",

"BinlogEventsChunkSize": 10000,

"SkipBinlogEventsContaining": [],

"ReduceReplicationAnalysisCount": true,

"FailureDetectionPeriodBlockMinutes": 5,

"RecoveryPeriodBlockSeconds": 30,

"RecoveryIgnoreHostnameFilters": [],

"RecoverMasterClusterFilters": [

"*"

"RecoverIntermediateMasterClusterFilters": [

"*"

"OnFailureDetectionProcesses": [

"echo '`date +'%Y-%m-%d %T'` Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"

"PreGracefulTakeoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"

"PreFailoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"

"PostFailoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}; failureClusterAlias:{failureClusterAlias}' >> /tmp/recovery.log",

"/usr/local/orchestrator/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"

"PostUnsuccessfulFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Unsuccessful Failover ' >> /tmp/recovery.log"],

"PostMasterFailoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"

"PostIntermediateMasterFailoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"

"PostGracefulTakeoverProcesses": [

"echo '`date +'%Y-%m-%d %T'` Planned takeover complete' >> /tmp/recovery.log"

"CoMasterRecoveryMustPromoteOtherCoMaster": true,

"DetachLostSlavesAfterMasterFailover": true,

"ApplyMySQLPromotionAfterMasterFailover": true,

"PreventCrossDataCenterMasterFailover": false,

"PreventCrossRegionMasterFailover": false,

"MasterFailoverDetachReplicaMasterHost": false,

"MasterFailoverLostInstancesDowntimeMinutes": 0,

"PostponeReplicaRecoveryOnLagMinutes": 0,

"OSCIgnoreHostnameFilters": [],

"GraphiteAddr": "",

"GraphitePath": "",

"GraphiteConvertHostnameDotsToUnderscores": true,

"ConsulAddress": "",

"ConsulAclToken": "",

"RaftEnabled":true,

"RaftDataDir":"/usr/local/orchestrator",

"RaftBind":"192.168.20.101",

"DefaultRaftPort":10008,

"RaftNodes":[

"192.168.20.101",

"192.168.20.102",

"192.168.20.103"

]

}

2.5-配置文件复制到其他机器上去

scp /etc/orchestrator.conf.json root@node2:/etc/orchestrator.conf.json

scp /etc/orchestrator.conf.json root@node3:/etc/orchestrator.conf.json

修改最后的 `"RaftBind":"192.168.20.101",` 的地址为本机地址

2.6-VIP切换脚本创建

所有服务器均创建，建议创建一个后复制过去改下

/usr/local/orchestrator/orch_hook.sh

/usr/local/orchestrator/orch_vip.sh

2.6.1-orch_hook.sh

可能需要修改：

注意脚本路径当前为：/usr/local/orchestrator/orch_vip.sh
array=( ens32 "192.168.20.111" root "192.168.20.101")

对应值为：网卡名称 VIP地址 ssh用户名本机IP

MYSQL_PWD 忽略

[root@node1 orchestrator]# cat orch_hook.sh

#!/bin/bash

isitdead=$1

cluster=$2

oldmaster=$3

newmaster=$4

mysqluser="orchestrator"

export MYSQL_PWD="xxxpassxxx"

logfile="/var/log/orch_hook.log"

# list of clusternames

#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)

#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

array=( ens32 "192.168.20.111" root "192.168.20.101")

interface=${array[0]}

IP=${array[1]}

user=${array[2]}

if [ ! -z ${IP} ] ; then

echo $(date)

echo "Revocering from: $isitdead"

echo "New master is: $newmaster"

echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile

/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

#mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql

else

echo "Cluster does not exist!" | tee $logfile

elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

array=( ens32 "192.168.20.111" root "192.168.20.101")

interface=${array[0]}

IP=${array[3]}

user=${array[2]}

slavehost=`echo $5 | cut -d":" -f1`

echo $(date)

echo "Revocering from: $isitdead"

echo "New intermediate master is: $slavehost"

echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile

/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster

elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

array=( ens32 "192.168.20.111" root "192.168.20.101")

interface=${array[0]}

IP=${array[3]}

user=${array[2]}

slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`

showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`

newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

echo $(date)

echo "Revocering from: $isitdead"

echo "New intermediate master is: $newintermediatemaster"

echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile

/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

2.6.2-orch_vip.sh

需要发邮件可以修改`emailaddress`的地址，并将 `sendmail`改为 1

[root@node1 orchestrator]# cat orch_vip.sh

#!/bin/bash

emailaddress="email@example.com"

sendmail=0

function usage {

cat << EOF

usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]

OPTIONS:

-h Show this message

-o string Old master hostname or IP address

-d int If master is dead should be 1 otherweise it is 0

-s string SSH options

-n string New master hostname or IP address

-i string Interface exmple eth0:1

-I string Virtual IP

-u string SSH user

EOF

}

while getopts ho:d:s:n:i:I:u: flag; do

case $flag in

orig_master="$OPTARG";

;;

isitdead="${OPTARG}";

;;

ssh_options="${OPTARG}";

;;

new_master="$OPTARG";

;;

interface="$OPTARG";

;;

vip="$OPTARG";

;;

ssh_user="$OPTARG";

;;

usage;

exit 0;

;;

usage;

exit 1;

;;

esac

done

if [ $OPTIND -eq 1 ]; then

echo "No options were passed";

usage;

shift $(( OPTIND - 1 ));

# discover commands from our path

ssh=$(which ssh)

arping=$(which arping)

ip2util=$(which ip)

# command for adding our vip

cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"

# command for deleting our vip

cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"

# command for discovering if our vip is enabled

cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"

# command for sending gratuitous arp to announce ip move

cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "

# command for sending gratuitous arp to announce ip move on current server

cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "

vip_stop() {

rc=0

# ensure the vip is removed

$ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \

"[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"

rc=$?

return $rc

}

vip_start() {

rc=0

# ensure the vip is added

# this command should exit with failure if we are unable to add the vip

# if the vip already exists always exit 0 (whether or not we added it)

$ssh ${ssh_options} -tt ${ssh_user}@${new_master} \

"[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"

rc=$?

$cmd_local_arp_fix

return $rc

}

vip_status() {

$arping -c 1 -I ${interface} ${vip%/*}

if ping -c 1 -W 1 "$vip"; then

return 0

else

return 1

}

if [[ $isitdead == 0 ]]; then

echo "Online failover"

if vip_stop; then

if vip_start; then

echo "$vip is moved to $new_master."

if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi

else

echo "Can't add $vip on $new_master!"

if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi

exit 1

else

echo $rc

echo "Can't remove the $vip from orig_master!"

if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null ; fi

exit 1

elif [[ $isitdead == 1 ]]; then

echo "Master is dead, failover"

# make sure the vip is not available

if vip_status; then

if vip_stop; then

if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi

else

if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi

exit 1

if vip_start; then

echo "$vip is moved to $new_master."

if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi

else

echo "Can't add $vip on $new_master!"

if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi

exit 1

else

echo "Wrong argument, the master is dead or live?"

2.7-设置VIP

仅在master节点上创建VIP

# 添加VIP

ip addr add 192.168.20.111 dev ens32

# 删除 ip addr del 192.168.20.111 dev ens32

2.8-启动orchestrator

所有机器启动

cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http &

2.9-设置环境变量

因为配置了Raft，有多个Orchestrator，所以需要ORCHESTRATOR_API的环境变量，orchestrator-client会自动选择leader。

所有机器执行 vi /etc/profile

# 在最后添加

export ORCHESTRATOR_API="node1:3000/api node2:3000/api node3:3000/api"

# 添加完成后source生效

source /etc/profile

2.10-发现拓扑

# 任意服务器上执行，发现leader在哪

[root@node2 ~]# orchestrator-client -c which-api

node1:3000/api

# 在任意服务器上执行

[root@node2 ~]# orchestrator-client -c discover -i 192.168.20.101:3306

node1:3306

展开

3-测试

3.1-kill或关闭mysql

结果会自动进行切换，IP自动漂移

3.2-直接poweroff掉master服务器

结果会自动进行切换，IP自动漂移

3.3-恢复

change master to master_host='192.168.20.101',master_port=3306,master_user='orchestrator',master_password='123456',master_auto_position=1,MASTER_HEARTBEAT_PERIOD=2,MASTER_CONNECT_RETRY=1,MASTER_RETRY_COUNT=86400;

set global slave_net_timeout=8;

# 从库设置为只读

set global read_only=1;

set global super_read_only=1;

start slave;

show slave status\G

4.1-查看拓扑

# 任意一个节点

[root@node1 orchestrator]# orchestrator-client -c topology -i node2

node1:3306 [0s,ok,5.7.27-log,rw,ROW,>>,GTID]

+ node2:3306 [0s,ok,5.7.27-log,ro,ROW,>>,GTID]

+ node3:3306 [0s,ok,5.7.27-log,ro,ROW,>>,GTID]

4.2-设置为只读模式/读写模式

# 设置node2为只读模式

orchestrator-client -c set-read-only -i node2

# 设置node2为读写模式

orchestrator-client -c set-writeable -i node2

4.3-优雅的进行主和指定从切换

# 提升node2为master，原主node1需要手动去执行 start slave

orchestrator-client -c graceful-master-takeover -a node1:3306 -d node2:3306

原结构：

node1:3306 [0s,ok,5.7.27-log,rw,ROW,>>,GTID]

+ node2:3306 [0s,ok,5.7.27-log,ro,ROW,>>,GTID]

+ node3:3306 [0s,ok,5.7.27-log,ro,ROW,>>,GTID]

变更后结构：

node2:3306 [0s,ok,5.7.27-log,rw,ROW,>>,GTID]

- node1:3306 [null,nonreplicating,5.7.27-log,ro,ROW,>>,GTID,downtimed] # 需要手动启动slave

+ node3:3306 [0s,ok,5.7.27-log,ro,ROW,>>,GTID]

墨力计划

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

mysql 高可用

orchestrator+vip方式实现MySQL高可用

评论