一、前言
MySQL主从复制由于是逻辑复制,经常会出现断开的情况,这里总结了几种处理方案,以备后续参考。在主从数据不一致的情况下,如果能很快的进行数据补录,尽量进行数据补录,实在无法补录或者数据一致性不是那么重要,可以采用跳过的方法。
二、传统复制
1、跳过报错
制造主从复制不一致的场景,注意show slave status中的几个状态的含义。last_error的SQL线程在master2-bin.000003,position 1741的位置停了。mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003 ------------------》从库IO线程读取到的主库的binlog的名称
Read_Master_Log_Pos: 1772 -------------------》从库IO线程读取到的主库的binlog的位置
Relay_Log_File: slave2-relay-bin.000002 -------------------》从库已经执行到的relay log
Relay_Log_Pos: 1410 -------------------》从库已经执行到的relay log的位置
Relay_Master_Log_File: master2-bin.000003 -------------------》从库执行的relay对应到主库的binlog的名称
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master2-bin.000003, end_log_pos 1741
Skip_Counter: 0
Exec_Master_Log_Pos: 1517 -------------------》从库执行的relay对应到主库的binlog的位置
Relay_Log_Space: 1873
Until_Condition: None
利用mysqlbinlog找到报错位置,寻找到下一个事务的点位是1772,
root@master2:/var/lib/mysql# mysqlbinlog --no-defaults --start-position=1741 master2-bin.000003
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#221228 14:26:58 server id 5733660 end_log_pos 123 CRC32 0xedba5865 Start: binlog v 4, server v 5.7.36-log created 221228 14:26:58 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
MlKsYw8cfVcAdwAAAHsAAAABAAQANS43LjM2LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAyUqxjEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AWVYuu0=
'/*!*/;
# at 1741
#221228 14:45:04 server id 5733660 end_log_pos 1772 CRC32 0x98eb74ab Xid = 36
COMMIT/*!*/;
# at 1772
#221228 15:25:26 server id 5733660 end_log_pos 1837 CRC32 0xda1b22e9 Anonymous_GTID last_committed=7 sequence_number=8 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
在备库重新指定点位,跳过报错的事务。mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql>
mysql> change master to
-> master_host='192.168.10.5',
-> master_port=33652,
-> master_user='repl',
-> master_password='repl',
-> MASTER_LOG_FILE='master2-bin.000003',
-> MASTER_LOG_POS=1772;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003
Read_Master_Log_Pos: 2282
Relay_Log_File: slave2-relay-bin.000002
Relay_Log_Pos: 832
Relay_Master_Log_File: master2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table: 2、设置SQL_SLAVE_SKIP_COUNTER参数。
这个文章里有详细的关于该参数的描述,https://www.cnblogs.com/Uest/p/7941329.html。这个参数在使用使用的时候,会跳过一些事务,如果粒度掌握不好,可能跳过的数据比较多。
使用方法如下,当出现主从不一致时,
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003
Read_Master_Log_Pos: 2537
Relay_Log_File: slave2-relay-bin.000002
Relay_Log_Pos: 832
Relay_Master_Log_File: master2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master2-bin.000003, end_log_pos 2506
Skip_Counter: 0
Exec_Master_Log_Pos: 2282
Relay_Log_Space: 1295
停止主从复制,设置SQL_SLAVE_SKIP_COUNTER参数,启动复制即可。
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> set SQL_SLAVE_SKIP_COUNTER=1;
ERROR 1229 (HY000): Variable 'sql_slave_skip_counter' is a GLOBAL variable and should be set with SET GLOBAL
mysql> set global SQL_SLAVE_SKIP_COUNTER=1;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003
Read_Master_Log_Pos: 2537
Relay_Log_File: slave2-relay-bin.000003
Relay_Log_Pos: 322
Relay_Master_Log_File: master2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
3、设置参数slave_exec_mode
这个参数为IDEMPOTENT模式(幂等模式)。设置成IDEMPOTENT模式可以让从库避免1032(从库上不存在的键)和1062(重复键,需要存在主键或则唯一键)的错误。这个只对row statment的binglog有效。具体可参考https://www.cnblogs.com/zhoujinyi/p/8035413.html
使用方式如下:
制造主键冲突的主从不一致场景:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003
Read_Master_Log_Pos: 2967
Relay_Log_File: slave2-relay-bin.000003
Relay_Log_Pos: 497
Relay_Master_Log_File: master2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table test.test; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master2-bin.000003, end_log_pos 2936
Skip_Counter: 0
Exec_Master_Log_Pos: 2712
Relay_Log_Space: 1893设置参数,并重启主从复制mysql> set global slave_exec_mode='IDEMPOTENT';
Query OK, 0 rows affected (0.00 sec)
mysql> stop slave ;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33652
Connect_Retry: 60
Master_Log_File: master2-bin.000003
Read_Master_Log_Pos: 2967
Relay_Log_File: slave2-relay-bin.000004
Relay_Log_Pos: 322
Relay_Master_Log_File: master2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
4、设置参数slave-skip-errors
对于主从复制报错,每一个报错都有一个对应的报错码,比如主键冲突是1062,找不到数据是1032,如果复制报错特别多,不想挨个修复,那就可以设置参数slave-skip-errors,可以把出现的报错码写到配置文件里,重启数据库,当遇到这个报错码就会自动跳过。
[mysqld]
# 跳过指定 error no 类型的错误
# slave-skip-errors = 1062,1053,1146
# 跳过所有错误
# slave-skip-errors = all三、GTID模式
1、跳过报错
制造主从不一致的场景,由于备库在执行delete的时候找不到记录而报错,尝试跳过这个事务,目前从库获取到的gtid是f433ea9e-4e3a-11ed-934c-0242ac480532:25888,执行到的gtid是f433ea9e-4e3a-11ed-934c-0242ac480532:1-25887,那么我们需要把gtid25888跳过。mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33650
Connect_Retry: 60
Master_Log_File: master1-bin.000012
Read_Master_Log_Pos: 2070
Relay_Log_File: slave1-relay-bin.000004
Relay_Log_Pos: 420
Relay_Master_Log_File: master1-bin.000012
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master1-bin.000012, end_log_pos 2039
。。。。。 Master_SSL_Crlpath:
Retrieved_Gtid_Set: f433ea9e-4e3a-11ed-934c-0242ac480532:25888
Executed_Gtid_Set: f433ea9e-4e3a-11ed-934c-0242ac480532:1-25887,
f76c04aa-4e3a-11ed-8322-0242ac480533:1-2
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
操作如下:
mysql> stop slave; ========================》停止复制
Query OK, 0 rows affected (0.01 sec)
mysql> SET @@SESSION.GTID_NEXT='f433ea9e-4e3a-11ed-934c-0242ac480532:25888'; ========================》查出失败事务的gtid
Query OK, 0 rows affected (0.00 sec)
mysql> begin;commit; ==============================================================》执行空事务
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> SET @@SESSION.GTID_NEXT=automatic; ===========================================》gtid自动执行
Query OK, 0 rows affected (0.00 sec)
mysql> start slave; ===================================》启动复制
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.5
Master_User: repl
Master_Port: 33650
Connect_Retry: 60
Master_Log_File: master1-bin.000012
Read_Master_Log_Pos: 2070
Relay_Log_File: slave1-relay-bin.000005
Relay_Log_Pos: 460
Relay_Master_Log_File: master1-bin.000012
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
四、总结
1、关于主从复制断开后跳过报错,总体来说如果数据不重要,跳过没啥问题,如果数据比较重要,还是建议补录数据。如果大量不一致,建议重建主从。
2、总的来说,单次跳过比设置参数要好一些,尤其是设置slave-skip-errors,那就是有报错就跳过,不管数据一致性。
3、如果缺数比较多,可以使用pt工具补录数据。这个可以参考https://www.modb.pro/db/523872。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




