介绍
Mysql 从库
Mysql 从库通过接收(或者是主动拉取)主库binlog日志,转换为自身中继日志,然后应用中继日志内容实现主从数据同步。
从库 CRC32 HASH算法
Mysql 从库进行数据同步,实则是重放一次主库的 dml 语句(不是原SQL重放),在同步表没有主键和唯一索引的情况下,从库为了提升同步效率,利用CRC32 HASH算法进行优化,但这个算法存在一个BUG,2个不同行的HASH值可能相同,这就造成从库在更新时,有可能更新到错误的行,从而引发后续的更新异常,从库同步中断。
在Mysql bug 列表 和 MOS 中均有此BUG的说明。


从库相关参数
控制从库在进行行匹配的参数为slave_rows_search_algorithms。
- 1、此参数在Mysql 8.0 开始默认值为“INDEX_SCAN,HASH_SCAN”
- 2、Mysql 8.0.18 标记为废弃
- 3、Mysql 8.3.0 彻底弃用,不允许调整。
以下为参数可设置值。
| Command-Line Format | --slave-rows-search-algorithms=value |
|---|---|
| Deprecated | 8.0.18 |
| System Variable | slave_rows_search_algorithms |
| Scope | Global |
| Dynamic | Yes |
SET_VAR Hint Applies | No |
| Type | Set |
| Default Value | INDEX_SCAN,HASH_SCAN |
| Valid Values |
|
BUG场景再现
环境准备
使用沙箱工具搭建Mysql 8.0.26 主从环境
[mysql@19db2 ~]$ dbdeployer versions
Basedir: /home/mysql/opt/mysql
5.7.29 5.7.31 8.0.26 8.0.33 8.0.34
[mysql@19db2 ~]$ dbdeployer deploy replication 8.0.26 -n 2 --gtid --force
Installing and starting master
. sandbox server started
Installing and starting slave1
.. sandbox server started
$HOME/sandboxes/rsandbox_8_0_26/initialize_slaves
initializing slave 1
Replication directory installed in $HOME/sandboxes/rsandbox_8_0_26
run 'dbdeployer usage multiple' for basic instructions'
[mysql@19db2 rsandbox_8_0_26]$ ./n1
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 8.0.26 MySQL Community Server - GPL
....
master [localhost:21627] {msandbox} ((none)) > show variables like '%slave_rows_search_algorithms%';
+------------------------------+----------------------+
| Variable_name | Value |
+------------------------------+----------------------+
| slave_rows_search_algorithms | INDEX_SCAN,HASH_SCAN |
+------------------------------+----------------------+
1 row in set (0.00 sec)
[mysql@19db2 rsandbox_8_0_26]$ ./n2
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.26 MySQL Community Server - GPL
....
slave1 [localhost:21628] {msandbox} ((none)) > show variables like '%slave_rows_search_algorithms%';
+------------------------------+----------------------+
| Variable_name | Value |
+------------------------------+----------------------+
| slave_rows_search_algorithms | INDEX_SCAN,HASH_SCAN |
+------------------------------+----------------------+
1 row in set (0.00 sec)
测试数据准备
master [localhost:21627] {msandbox} ((none)) > create database fails;
Query OK, 1 row affected (0.00 sec)
master [localhost:21627] {msandbox} ((none)) > use fails;
Database changed
-- 测试表 无主键 无唯一索引
master [localhost:21627] {msandbox} (fails) > CREATE TABLE `task_sequence` (
-> `T_NAME` varchar(100) DEFAULT NULL ,
-> `CURRENT_VALUE` decimal(10,0) DEFAULT '0' ,
-> `T_INCREMENT` decimal(1,0) DEFAULT '1' ,
-> `BIZ_TYPE` varchar(10) DEFAULT NULL
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 ;
Query OK, 0 rows affected, 1 warning (0.01 sec)
-- 主库插入2条模拟数据
master [localhost:21627] {msandbox} (fails) > insert into task_sequence (T_NAME,CURRENT_VALUE,T_INCREMENT,BIZ_TYPE) values('20230820110700',267,1,'04');
Query OK, 1 row affected (0.00 sec)
master [localhost:21627] {msandbox} (fails) > insert into task_sequence (T_NAME,CURRENT_VALUE,T_INCREMENT,BIZ_TYPE) values('20240412120134',329,1,'03');
Query OK, 1 row affected (0.00 sec)
主从库验证模拟数据
此时主从同步正常,数据正常
master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 329 | 1 | 03 |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 329 | 1 | 03 |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
更新数据,BUG验证
下面将对T_NAME=‘20240412120134’ 行进行更新,验证CRC32 HASH算法BUG
Time 1 主库第一次更新T_NAME=‘20240412120134’
master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
此时主从数据同步正常
master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 330 | 1 | 03 | <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 330 | 1 | 03 | <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
Time 2 主库第二次更新T_NAME=‘20240412120134’
master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
此时主从同步正常,但数据已经异常,从库本应该更新T_NAME='20240412120134’的CURRENT_VALUE为331,但是由于CRC32 BUG存在,错误的更新了T_NAME='20230820110700’的CURRENT_VALUE为331,主从数据已经不一致。
master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 331 | 1 | 03 | <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 331 | 1 | 04 | <==================== 从库已更新到错误行
| 20240412120134 | 330 | 1 | 03 |
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 127.0.0.1
Source_User: rsandbox
Source_Port: 21627
Connect_Retry: 60
Source_Log_File: mysql-bin.000001
Read_Source_Log_Pos: 10280
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 10495
Relay_Source_Log_File: mysql-bin.000001
Replica_IO_Running: Yes <==================== 从库复制正常
Replica_SQL_Running: Yes <==================== 从库复制正常
Time 3 主库第三次更新T_NAME=‘20240412120134’
master [localhost:21627] {msandbox} (fails) > update task_sequence set CURRENT_VALUE=CURRENT_VALUE+T_INCREMENT where T_NAME='20240412120134';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
此时主从同步异常,从库复制线程报错,找不到对应行。
master [localhost:21627] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 267 | 1 | 04 |
| 20240412120134 | 332 | 1 | 03 | <====================
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > select * from task_sequence;
+----------------+---------------+-------------+----------+
| T_NAME | CURRENT_VALUE | T_INCREMENT | BIZ_TYPE |
+----------------+---------------+-------------+----------+
| 20230820110700 | 331 | 1 | 04 |
| 20240412120134 | 330 | 1 | 03 | <==================== 从库数据未更新
+----------------+---------------+-------------+----------+
2 rows in set (0.00 sec)
slave1 [localhost:21628] {msandbox} (fails) > show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 127.0.0.1
Source_User: rsandbox
Source_Port: 21627
Connect_Retry: 60
Source_Log_File: mysql-bin.000001
Read_Source_Log_Pos: 10637
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 10495
Relay_Source_Log_File: mysql-bin.000001
Replica_IO_Running: Yes
Replica_SQL_Running: No <==================== 从库复制异常中断
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table fails.task_sequence; Can't find record in 'task_sequence', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 10606
总结
Mysql 在从库表缺少主键或唯一索引时(有其一即可),由于HASH算法BUG,会造成更新到错误行,进而造成主从同步异常,而且随着表数据量增大,概率也会增大,在Mysql 8.3.0之后从库slave_rows_search_algorithms参数已默认为“INDEX_SCAN,HASH_SCAN”,不支持修改,所以建议在设计Mysql 表结构时,尽量添加主键或者唯一索引,减少此现象的发生。
参考文档
MySQL MySQL 8.0 Release Notes Changes in MySQL 8.0.18 (2019-10-14, General Availability)
MySQL MySQL 8.3 Release Notes Changes in MySQL 8.3.0 (2024-01-16, Innovation Release)
MySQL Bugs 101828 Hash scan incorrectly applied the log resulting in HA_ERR_END_OF_FILE
On Avoiding Replication Error: Error_code: 1032; handler error HA_ERR_END_OF_FILE (Doc ID 2804769.1)




