暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【MySQL主从复制断开的处理方案】

原创 张sir 2022-12-28
1326

一、前言

MySQL主从复制由于是逻辑复制,经常会出现断开的情况,这里总结了几种处理方案,以备后续参考。在主从数据不一致的情况下,如果能很快的进行数据补录,尽量进行数据补录,实在无法补录或者数据一致性不是那么重要,可以采用跳过的方法。

二、传统复制

1、跳过报错

制造主从复制不一致的场景,注意show slave status中的几个状态的含义。last_error的SQL线程在master2-bin.000003,position 1741的位置停了。mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33652
                Connect_Retry: 60
              Master_Log_File: master2-bin.000003                     ------------------》从库IO线程读取到的主库的binlog的名称
          Read_Master_Log_Pos: 1772                                   -------------------》从库IO线程读取到的主库的binlog的位置
               Relay_Log_File: slave2-relay-bin.000002                -------------------》从库已经执行到的relay log
                Relay_Log_Pos: 1410                                   -------------------》从库已经执行到的relay log的位置
        Relay_Master_Log_File: master2-bin.000003                     -------------------》从库执行的relay对应到主库的binlog的名称
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master2-bin.000003, end_log_pos 1741
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1517                                   -------------------》从库执行的relay对应到主库的binlog的位置                                   
              Relay_Log_Space: 1873
              Until_Condition: None
 利用mysqlbinlog找到报错位置,寻找到下一个事务的点位是1772,
root@master2:/var/lib/mysql# mysqlbinlog --no-defaults --start-position=1741 master2-bin.000003 
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#221228 14:26:58 server id 5733660  end_log_pos 123 CRC32 0xedba5865 	Start: binlog v 4, server v 5.7.36-log created 221228 14:26:58 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
MlKsYw8cfVcAdwAAAHsAAAABAAQANS43LjM2LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAyUqxjEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AWVYuu0=
'/*!*/;
# at 1741
#221228 14:45:04 server id 5733660  end_log_pos 1772 CRC32 0x98eb74ab 	Xid = 36
COMMIT/*!*/;
# at 1772
#221228 15:25:26 server id 5733660  end_log_pos 1837 CRC32 0xda1b22e9 	Anonymous_GTID	last_committed=7	sequence_number=8	rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
在备库重新指定点位,跳过报错的事务。mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> 
mysql> change master to
    -> master_host='192.168.10.5',
    -> master_port=33652,
    -> master_user='repl',
    -> master_password='repl',
    -> MASTER_LOG_FILE='master2-bin.000003',
    -> MASTER_LOG_POS=1772;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33652
                Connect_Retry: 60
              Master_Log_File: master2-bin.000003
          Read_Master_Log_Pos: 2282
               Relay_Log_File: slave2-relay-bin.000002
                Relay_Log_Pos: 832
        Relay_Master_Log_File: master2-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 

2、设置SQL_SLAVE_SKIP_COUNTER参数。

这个文章里有详细的关于该参数的描述,https://www.cnblogs.com/Uest/p/7941329.html。这个参数在使用使用的时候,会跳过一些事务,如果粒度掌握不好,可能跳过的数据比较多。

使用方法如下,当出现主从不一致时,
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33652
                Connect_Retry: 60
              Master_Log_File: master2-bin.000003
          Read_Master_Log_Pos: 2537
               Relay_Log_File: slave2-relay-bin.000002
                Relay_Log_Pos: 832
        Relay_Master_Log_File: master2-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master2-bin.000003, end_log_pos 2506
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2282
              Relay_Log_Space: 1295
停止主从复制,设置SQL_SLAVE_SKIP_COUNTER参数,启动复制即可。 mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> set SQL_SLAVE_SKIP_COUNTER=1; ERROR 1229 (HY000): Variable 'sql_slave_skip_counter' is a GLOBAL variable and should be set with SET GLOBAL mysql> set global SQL_SLAVE_SKIP_COUNTER=1; Query OK, 0 rows affected (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.5 Master_User: repl Master_Port: 33652 Connect_Retry: 60 Master_Log_File: master2-bin.000003 Read_Master_Log_Pos: 2537 Relay_Log_File: slave2-relay-bin.000003 Relay_Log_Pos: 322 Relay_Master_Log_File: master2-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes

3、设置参数slave_exec_mode

这个参数为IDEMPOTENT模式(幂等模式)。设置成IDEMPOTENT模式可以让从库避免1032(从库上不存在的键)和1062(重复键,需要存在主键或则唯一键)的错误。这个只对row statment的binglog有效。具体可参考https://www.cnblogs.com/zhoujinyi/p/8035413.html

使用方式如下:

制造主键冲突的主从不一致场景:
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33652
                Connect_Retry: 60
              Master_Log_File: master2-bin.000003
          Read_Master_Log_Pos: 2967
               Relay_Log_File: slave2-relay-bin.000003
                Relay_Log_Pos: 497
        Relay_Master_Log_File: master2-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table test.test; Duplicate entry '1' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master2-bin.000003, end_log_pos 2936
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2712
              Relay_Log_Space: 1893设置参数,并重启主从复制mysql> set global slave_exec_mode='IDEMPOTENT';
Query OK, 0 rows affected (0.00 sec)

mysql> stop slave ;
Query OK, 0 rows affected (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33652
                Connect_Retry: 60
              Master_Log_File: master2-bin.000003
          Read_Master_Log_Pos: 2967
               Relay_Log_File: slave2-relay-bin.000004
                Relay_Log_Pos: 322
        Relay_Master_Log_File: master2-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 

4、设置参数slave-skip-errors

对于主从复制报错,每一个报错都有一个对应的报错码,比如主键冲突是1062,找不到数据是1032,如果复制报错特别多,不想挨个修复,那就可以设置参数slave-skip-errors,可以把出现的报错码写到配置文件里,重启数据库,当遇到这个报错码就会自动跳过。

[mysqld]
 
# 跳过指定 error no 类型的错误
 
# slave-skip-errors = 1062,1053,1146
 
# 跳过所有错误
 
# slave-skip-errors = all

三、GTID模式

1、跳过报错

制造主从不一致的场景,由于备库在执行delete的时候找不到记录而报错,尝试跳过这个事务,目前从库获取到的gtid是f433ea9e-4e3a-11ed-934c-0242ac480532:25888,执行到的gtid是f433ea9e-4e3a-11ed-934c-0242ac480532:1-25887,那么我们需要把gtid25888跳过。mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.5
                  Master_User: repl
                  Master_Port: 33650
                Connect_Retry: 60
              Master_Log_File: master1-bin.000012
          Read_Master_Log_Pos: 2070
               Relay_Log_File: slave1-relay-bin.000004
                Relay_Log_Pos: 420
        Relay_Master_Log_File: master1-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table test.test; Can't find record in 'test', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master1-bin.000012, end_log_pos 2039
。。。。。           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: f433ea9e-4e3a-11ed-934c-0242ac480532:25888
            Executed_Gtid_Set: f433ea9e-4e3a-11ed-934c-0242ac480532:1-25887,
f76c04aa-4e3a-11ed-8322-0242ac480533:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version:
操作如下:
mysql> stop slave; ========================》停止复制 Query OK, 0 rows affected (0.01 sec) mysql> SET @@SESSION.GTID_NEXT='f433ea9e-4e3a-11ed-934c-0242ac480532:25888'; ========================》查出失败事务的gtid Query OK, 0 rows affected (0.00 sec) mysql> begin;commit; ==============================================================》执行空事务 Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SET @@SESSION.GTID_NEXT=automatic; ===========================================》gtid自动执行 Query OK, 0 rows affected (0.00 sec) mysql> start slave; ===================================》启动复制 Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.5 Master_User: repl Master_Port: 33650 Connect_Retry: 60 Master_Log_File: master1-bin.000012 Read_Master_Log_Pos: 2070 Relay_Log_File: slave1-relay-bin.000005 Relay_Log_Pos: 460 Relay_Master_Log_File: master1-bin.000012 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB:

四、总结

1、关于主从复制断开后跳过报错,总体来说如果数据不重要,跳过没啥问题,如果数据比较重要,还是建议补录数据。如果大量不一致,建议重建主从。

2、总的来说,单次跳过比设置参数要好一些,尤其是设置slave-skip-errors,那就是有报错就跳过,不管数据一致性。

3、如果缺数比较多,可以使用pt工具补录数据。这个可以参考https://www.modb.pro/db/523872







「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论