暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

Mysql 因 CRC32 HASH算法BUG造成从库同步异常

488

介绍

Mysql 从库

    Mysql 从库通过接收(或者是主动拉取)主库binlog日志,转换为自身中继日志,然后应用中继日志内容实现主从数据同步。

从库 CRC32 HASH算法

    Mysql 从库进行数据同步,实则是重放一次主库的 dml 语句(不是原SQL重放),在同步表没有主键和唯一索引的情况下,从库为了提升同步效率,利用CRC32 HASH算法进行优化,但这个算法存在一个BUG,2个不同行的HASH值可能相同,这就造成从库在更新时,有可能更新到错误的行,从而引发后续的更新异常,从库同步中断。

    在Mysql bug 列表 和 MOS 中均有此BUG的说明。
微信截图_20240726101249.png

微信截图_20240726101347.png

从库相关参数

控制从库在进行行匹配的参数为slave_rows_search_algorithms。

  • 1、此参数在Mysql 8.0 开始默认值为“INDEX_SCAN,HASH_SCAN”
  • 2、Mysql 8.0.18 标记为废弃
  • 3、Mysql 8.3.0 彻底弃用,不允许调整。

以下为参数可设置值。

Command-Line Format--slave-rows-search-algorithms=value
Deprecated8.0.18
System Variableslave_rows_search_algorithms
ScopeGlobal
DynamicYes
SET_VAR Hint AppliesNo
TypeSet
Default ValueINDEX_SCAN,HASH_SCAN
Valid Values

TABLE_SCAN,INDEX_SCAN

INDEX_SCAN,HASH_SCAN

TABLE_SCAN,HASH_SCAN

TABLE_SCAN,INDEX_SCAN,HASH_SCAN (equivalent to INDEX_SCAN,HASH_SCAN)

BUG场景再现

环境准备

    使用沙箱工具搭建Mysql 8.0.26 主从环境

[mysql@19db2 ~]$ dbdeployer versions
Basedir: /home/mysql/opt/mysql
5.7.29  5.7.31  8.0.26  8.0.33  8.0.34  
[mysql@19db2 ~]$ dbdeployer deploy replication 8.0.26 -n 2 --gtid --force      
Installing and starting master
. sandbox server started
Installing and starting slave1
.. sandbox server started
$HOME/sandboxes/rsandbox_8_0_26/initialize_slaves
initializing slave 1
Replication directory installed in $HOME/sandboxes/rsandbox_8_0_26
run 'dbdeployer usage multiple' for basic instructions'

[mysql@19db2 rsandbox_8_0_26]$ ./n1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 8.0.26 MySQL Community Server - GPL

....

master [localhost:21627] {msandbox} ((none)) > show variables like '%slave_rows_search_algorithms%';
+------------------------------+----------------------+
| Variable_name                | Value                |
+------------------------------+----------------------+
| slave_rows_search_algorithms | INDEX_SCAN,HASH_SCAN |
+------------------------------+----------------------+
1 row in set (0.00 sec)

[mysql@19db2 rsandbox_8_0_26]$ ./n2
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.26 MySQL Community Server - GPL
....

slave1 [localhost:21628] {msandbox} ((none)) > show variables like '%slave_rows_search_algorithms%';
+------------------------------+----------------------+
| Variable_name                | Value                |
+------------------------------+----------------------+
| slave_rows_search_algorithms | INDEX_SCAN,HASH_SCAN |
+------------------------------+----------------------+
1 row in set (0.00 sec)

测试数据准备

master [localhost:21627] {msandbox} ((none)) > create database fails;
Query OK, 1 row affected (0.00 sec)

master [localhost:21627] {msandbox} ((none)) > use fails;
Database changed

-- 测试表 无主键 无唯一索引
master [localhost:21627] {msandbox} (fails) > CREATE TABLE `task_sequence` (
    ->   `T_NAME` varchar(100) DEFAULT NULL ,
    ->   `CURRENT_VALUE` decimal(10,0) DEFAULT '0' ,
    ->   `T_INCREMENT` decimal(1,0) DEFAULT '1' ,
    ->   `BIZ_TYPE` varchar(10) DEFAULT NULL 
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 ;
Query OK, 0 rows affected, 1 warning (0.01 sec)

-- 主库插入2条模拟数据
master [localhost:21627] {msandbox} (fails) > insert into task_sequence (T_NAME,CURRENT_VALUE,T_INCREMENT,BIZ_TYPE) values('20230820110700',267,1,'04');
Query OK, 1 row affected (0.00 sec)

master [localhost:21627] {msandbox} (fails) > insert into task_sequence (T_NAME,CURRENT_VALUE,T_INCREMENT,BIZ_TYPE) values('20240412120134',329,1,'03');
Query OK, 1 row affected (0.00 sec)

主从库验证模拟数据

    此时主从同步正常,数据正常

master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           329 |           1 | 03       |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           329 |           1 | 03       |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

更新数据,BUG验证

    下面将对T_NAME=‘20240412120134’ 行进行更新,验证CRC32 HASH算法BUG

Time 1 主库第一次更新T_NAME=‘20240412120134’

master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

    此时主从数据同步正常

master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           330 |           1 | 03       |     <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           330 |           1 | 03       |     <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

Time 2 主库第二次更新T_NAME=‘20240412120134’

master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

    此时主从同步正常,但数据已经异常,从库本应该更新T_NAME='20240412120134’的CURRENT_VALUE为331,但是由于CRC32 BUG存在,错误的更新了T_NAME='20230820110700’的CURRENT_VALUE为331,主从数据已经不一致。

master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           331 |           1 | 03       |     <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           331 |           1 | 04       |     <==================== 从库已更新到错误行
| 20240412120134 |           330 |           1 | 03       |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 127.0.0.1
                  Source_User: rsandbox
                  Source_Port: 21627
                Connect_Retry: 60
              Source_Log_File: mysql-bin.000001
          Read_Source_Log_Pos: 10280
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 10495
        Relay_Source_Log_File: mysql-bin.000001
           Replica_IO_Running: Yes    <==================== 从库复制正常
          Replica_SQL_Running: Yes    <==================== 从库复制正常

Time 3 主库第三次更新T_NAME=‘20240412120134’

master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

    此时主从同步异常,从库复制线程报错,找不到对应行。

master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           267 |           1 | 04       |
| 20240412120134 |           332 |           1 | 03       |     <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME         | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 |           331 |           1 | 04       |
| 20240412120134 |           330 |           1 | 03       |     <==================== 从库数据未更新
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)

slave1 [localhost:21628] {msandbox} (fails) > show replica status\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 127.0.0.1
                  Source_User: rsandbox
                  Source_Port: 21627
                Connect_Retry: 60
              Source_Log_File: mysql-bin.000001
          Read_Source_Log_Pos: 10637
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 10495
        Relay_Source_Log_File: mysql-bin.000001
           Replica_IO_Running: Yes
          Replica_SQL_Running: No     <==================== 从库复制异常中断  
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table fails.task_sequence; Can't find record in 'task_sequence', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 10606

总结

    Mysql 在从库表缺少主键或唯一索引时(有其一即可),由于HASH算法BUG,会造成更新到错误行,进而造成主从同步异常,而且随着表数据量增大,概率也会增大,在Mysql 8.3.0之后从库slave_rows_search_algorithms参数已默认为“INDEX_SCAN,HASH_SCAN”,不支持修改,所以建议在设计Mysql 表结构时,尽量添加主键或者唯一索引,减少此现象的发生。

参考文档

MySQL MySQL 8.0 Release Notes Changes in MySQL 8.0.18 (2019-10-14, General Availability)
MySQL MySQL 8.3 Release Notes Changes in MySQL 8.3.0 (2024-01-16, Innovation Release)
MySQL Bugs 101828 Hash scan incorrectly applied the log resulting in HA_ERR_END_OF_FILE
On Avoiding Replication Error: Error_code: 1032; handler error HA_ERR_END_OF_FILE (Doc ID 2804769.1)

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论