MySQL又遇备库延迟

原创黄超 2022-05-05

3270

今天MySQL又遇备库延迟案例，顺便总结一下

案例现象

mysql>show slave status\G

Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.13.20
Master_User: rpl_user
Master_Port: 3310
Connect_Retry: 60
Master_Log_File: mysql-bin.000013
Read_Master_Log_Pos: 48207531
Relay_Log_File: relay-bin-rpl1.000023
Relay_Log_Pos: 1153
Relay_Master_Log_File: mysql-bin.000013
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 940
Relay_Log_Space: 48208449
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 65708
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2155887676
Master_UUID: 000c1822-5be5-11ec-be70-a8ca7b94d69c
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 000c1822-5be5-11ec-be70-a8ca7b94d69c:20-2141
Executed_Gtid_Set: 000c1822-5be5-11ec-be70-a8ca7b94d69c:1-2136,
6feab22c-5be4-11ec-b641-1047800760df:1-16

分析

1.查看目前执行的事务

Seconds_Behind_Master的数值不断在变大，首先看看目前正在跑的是什么事务，
查看Relay_Master_Log_File: mysql-bin.000013和Exec_Master_Log_Pos: 940
在主库：

mysqlbinlog --no-defaults --base64-output=decode-rows -v -v mysql-bin.000013  | grep -A '100'  940

发现目前卡在执行gtid:000c1822-5be5-11ec-be70-a8ca7b94d69c:2137上，这是一个大事务，批量删除delete from procu1 limit 6000000。因为BINLOG设置的ROW格式，主库在执行了一条delete sql;在binlog会生成六百万条delete记录，到了备库要执行六百万次。

2.查看事务表

在备库：
mysql>select * from information_schema.innodb_trx\G

trx_id:25389

trx_state:running

start:2020-12-11 10:33:41

requested_lock_id:null

waite_started:null

weight:107178

mysql_thread_id:94

tables_in_use:1

tables_locked:1

lock_structs:692

lock_memory_bytes:90320

rows_locked:212972

rows_modified:106486

concurrency_tickets:0

isolation_leve:read commited

unique_checks:1

foreign_key_checks:1

foreign_key_error:null

没有具体的sql语句trx_query,可以看到rows_locked和rows_modified在增加。rows_modified这个值要一直增加到六百万才结束，而且因为autocommit=off，数据目前都是在undo上

3.查看innodb引擎信息

在备库：
mysql>show engine innodb status\G
摘要:

---TRANSACTION **25389**, ACTIVE 58424 sec
mysql tables in use 1, locked 1
18740 lock struct(s), heap size 3137744, 11885860 row lock(s), undo log entries 5942930
MySQL thread id 65, OS thread handle 139944933123840, query id 415 Executing event

4.查看会话信息

在备库：
mysql>show processlist

| 64 | system user | | NULL | Connect | 58362 | Waiting for dependent transaction to commit | NULL |
| 65 | system user | | NULL | Connect | 66073 | Executing event | NULL |

解决

备库复制线程是在后台执行，若是kill掉线程，会暂停slave同步，无法解决。
可以尝试让备库资源宽裕，加快执行，然后耐心等待大事务完成，追上最新数据。
例如一些优化操作：

SET GLOBAL innodb_buffer_pool_size=增大值;
SET GLOBAL slave_pending_jobs_size_max=增大值;
SET GLOBAL foreign_key_checks=0;
SET GLOBAL unique_checks=0;
SET GLOBAL innodb_flush_log_at_trx_commit=0;
SET GLOBAL sync_binlog=0;

若是考虑需要等候时间太久，人工干预方案：

跳过事务

在备库手动执行跑批操作，不做binlog记录，再跳过gtid

stop slave;
set sql_log_bin=0;
set global super_read_only=0;
delete from procu1 limit 6000000;
commit;
set global super_read_only=1;
set sql_log_bin=1;
SET @@SESSION.GTID_NEXT= '000c1822-5be5-11ec-be70-a8ca7b94d69c:1-2137';
BEGIN; COMMIT;
SET SESSION GTID_NEXT = AUTOMATIC;
START SLAVE;

重做备库

在主库全量备份，恢复到备库

总结

一、常见主备延迟场景

备库所在机器的性能要比主库所在的机器性能差
备库的查询压力大
大事务
大表DDL
备库的并行复制能力
随机重放
主库并发高
锁等待

二、常见主备延迟优化

1.主库优化

DML大事务，分拆小事务，小量分批多次执行
DDL大表操作，第三方工具pt-online-schema-change，业务低谷执行

2.备库优化

增大从库参数innodb_buffer_pool_size的值，可以缓存更多数据，减少由于转换导致的IO压力。
增大参数innodb_log_file_size、innodb_log_files_in_group的值，减少buffer pool的刷盘IO,提升写入性能。
修改参数innodb_flush_method为O_DIRECT,提升写入性能。
关掉从库Binlog日志或者关掉参数log_slave_updates.
修改参数innodb_flush_log_at_tr_commit为0或2.
如果binlog没有关掉（gtid开启，需要binlog日志打开），修改sync_binlog参数为0或者一个很大的值，减少磁盘IO压力。
如果binlog_format为row模式并且被修改表没有主键，则需要加上主键
如果binlog_format为row模式，则可以在备库总删除一些不必要的索引(同步完成之后再加上)。
了解清楚写库上操作内容，适当地在备库中预热一些数据，可以减少在复制时等待的时间。
修改参数master_info_repository、relay_log_info_repository为table，减少直接IO导致的磁盘压力。
可以打开多线程复制。slave_parallel_type=‘logical_clock’ slave_parallel_workers=32
升级mysql 版本
升级硬件

墨力计划 mysql

最后修改时间：2022-05-06 15:24:01

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

MySQL专栏（共19篇）

MySQL专栏