【MySQL】你以为设置了并行复制就降低延迟了？这个你绝对想不到！

DigOps 2019-09-25

230

点击上方

蓝色

文字关注我吧！

在MySQL官方版本中，为了保证其的高可用性，一般情况我们会采用主从复制的方式来解决。当然，方法很多。而我们今天所要处理的是采用GTID方式并且开了多线程复制后，仍然延迟的情况，糟糕的是，延迟还在不断扩大！

环境概要

序号	清单	说明
1	系统	Redhat 6.x（4c，32g）
2	数据库	MySQL-5.7.25
3	同步方式	基于GTID主从同步

环境排查

1）已经配置的重要参数：

#relay for slave
slave_parallel_type             = LOGICAL_CLOCK
slave_parallel_workers          = 6
master_info_repository          = TABLE
relay_log_info_repository       = TABLE 
relay_log_recovery              = on
sync_relay_log                  = 10000

注：此时没有设置```slave_preserve_commit_order```参数。 2）从库延迟状态查询

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: xxx.xxx.xxx.xxx
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.008978
          Read_Master_Log_Pos: 696914605               Relay_Log_File: DB41-relay-bin.001259
                Relay_Log_Pos: 207377582
        Relay_Master_Log_File: mysql-bin.008970
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: neteagle3
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0          Exec_Master_Log_Pos: 1068770059
              Relay_Log_Space: 8425484286
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 187358Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 42
                  Master_UUID: eab7fcac-3cda-11e6-ada8-fa163e648db2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for dependent transaction to commit
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: eab7fcac-3cda-11e6-ada8-fa163e648db2:58031191-59927276
            Executed_Gtid_Set: eab7fcac-3cda-11e6-ada8-fa163e648db2:1-58080239:58080241
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

简单介绍一下几个指标信息：

1. Master_Log_File 1. Read_Master_Log_Pos 1. Seconds_Behind_Master 1. Relay_Log_File 1. Relay_Log_Pos 1. Relay_Master_Log_File 1. Exec_Master_Log_Pos Master_Log_File , Read_Master_Log_Pos ：这两个参数是成对的，表示的是从库IO thread传输主库的binlog日志号及具体位置。 Relay_Log_File , Relay_Log_Pos ：这两个参数也是成对的，表示的是从库sql thread应用中继日志（relay log）号及具体位置。 Relay_Master_Log_File , Relay_Log_Pos ：这两个参数也是成对的，表示的是上一项中的中继日志对应的主库binlog日志及具体位置（有点绕）。 Seconds_Behind_Master ：此参数可简单理解为主从延迟时间，单位为秒。

从上面这段MySQL从状态信息中，我们可以看到，```Seconds_Behind_Master: 187358```这是从库sql应用延迟主库的时间为187358秒，转换成天，大概两天多。这说明，我们从库复制的数据是两天前的。

3)验证并行复制

mysql> show full processlist;
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+
| Id | User        | Host         | db        | Command          | Time   | State                                                         | Info                  |
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+
|  1 | system user |              | NULL      | Connect          |  18204 | Waiting for master to send event                              | NULL                  |
|  2 | system user |              | NULL      | Connect          |      0 | Waiting for dependent transaction to commit                   | NULL                  |
|  3 | system user |              | NULL      | Connect          | 154914 | System lock                                                   | NULL                  |
|  4 | system user |              | NULL      | Connect          | 154914 | Waiting for an event from Coordinator                         | NULL                  |
|  5 | system user |              | NULL      | Connect          | 154918 | Waiting for an event from Coordinator                         | NULL                  |
|  6 | system user |              | NULL      | Connect          | 155525 | Waiting for an event from Coordinator                         | NULL                  |
|  7 | system user |              | NULL      | Connect          | 180427 | Waiting for an event from Coordinator                         | NULL                  |
|  8 | system user |              | NULL      | Connect          |  18204 | Waiting for an event from Coordinator                         | NULL                  |
| 10 | root        | localhost    | neteagle3 | Query            |      0 | starting                                                      | show full processlist |
| 11 | repl        | DBSlave:9683 | NULL      | Binlog Dump GTID |  18156 | Master has sent all binlog to slave; waiting for more updates | NULL                  |
| 13 | root        | localhost    | neteagle3 | Sleep            |   4962 |                                                               | NULL                  |
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+

mysql> select * from  performance_schema.replication_applier_status_by_worker ;
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
| CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                         | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
|              |         1 |        51 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240805 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         2 |        52 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240210 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         3 |        53 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80235089 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         4 |        54 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:80191268 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         5 |        55 | ON            | eab7fcac-3cda-11e6-ada8-fa163e648db2:75296683 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         6 |        56 | ON            |                                               |                 0 |                    | 0000-00-00 00:00:00  |
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
6 rows in set (0.00 sec)通过本条查询，可以看到开启了6个并行进行复制。

细找瓶颈？

通过以上来看，所有一切似乎都正常，并行复制开了，并且查看到CPU，IO，内存均没有达到瓶颈地步。主库写binlog日志大概为2MB/s,这样的日质量并不是非常高。

从库也同样查看了是否存在锁的情况，也没有发现。

因此反复的在查看slave状态，看能否发现一些细节，结果还真看到了一些异常现象。Relay_Log_Pos
这个参数在频繁的刷slave状态时，发现时常会卡着不动（此时已确认没有看到锁）。或许问题真正的原因正在这里！

顺藤摸瓜！

分析binlog或者relay log日志，看有啥线索：

[mysql@xxx data]$ mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS  relay-bin.001384 --start-position=420090430|more
//;
//;
DELIMITER //;

我们查看了中继日志relay-bin.001384
卡住的位置号420090430，为设置GTID_NEXT，信息没什么用。

继续看在卡住时刻，数据库open的是什么表？

mysql>   show open tables where In_use=1;
+-----------+---------------------+--------+-------------+
| Database  | Table               | In_use | Name_locked |
+-----------+---------------------+--------+-------------+
| net       | f_currentxxx        |      1 |           0 |
+-----------+---------------------+--------+-------------+

这张表有什么特殊的么？查看其表结构

mysql>   show create table net.f_currentxxx\G
 1. row 
       Table: f_currentxxx
Create Table: CREATE TABLE  (
   int(20) NOT NULL COMMENT 'xxx',
   bigint(20) NOT NULL COMMENT 'xxx',
   int(11) DEFAULT NULL COMMENT 'xxx',
   int(11) DEFAULT NULL COMMENT 'xxx,
  KEY  ()
) ENGINE=MEMORY DEFAULT CHARSET=gbk COMMENT='xxx'
1 row in set (0.00 sec)

有没有看到什么与众不同？

没错，就是表的存储引擎ENGINE=MEMORY
。MEMORY的表进行主从复制，首先来说如果从库不做查询，一点意义没有，另外对Memory表做复制，性能是非常堪忧的。如果必须进行同步，考虑将表的存储引擎改为InnoDB

mysql> select table_name from information_schema.tables where TABLE_SCHEMA='net' and ENGINE='memory';
+----------------------+
| table_name           |
+----------------------+
| f_currentxxx         |
+----------------------+
1 row in set, 6 warnings (0.01 sec)

措施

停止复制进程，将选项中添加replicate-ignore-table=net.f_currentxxx
,重新启动复制进程，观察slave状态。

mysql> show slave status\G
 1. row 
               Slave_IO_State: Waiting for master to send event
                  Master_Host: xxx.xxx.xxx.xxx
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.009194
          Read_Master_Log_Pos: 939698255
               Relay_Log_File: relay-bin.001964
                Relay_Log_Pos: 444060572
        Relay_Master_Log_File: mysql-bin.009027
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: net
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table:

我们可以看到net.f_currentxxx表已经被忽略复制。持续观察一段时间后，Seconds_Behind_Master在逐渐缩小。
中继日志应用速度大约5分钟一个（每个中继日志为1GB大小），而主库binlog日志大约为10分钟一个（每个binlog日志为1GB大小）。

总结

在梳理了整个处理过程后，其实难度不高，主要是要细心，细心去排查每一个想到的点。在非轻量级的数据库中问题发生的概率也会随着量级的增加而增多。而这恰恰是能够磨练个人的成长。

同时，知识的储备也要充足，这是进阶高手的必要前提！

文章转载自DigOps，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

【MySQL】你以为设置了并行复制就降低延迟了？这个你绝对想不到！

环境概要

环境排查

细找瓶颈？

顺藤摸瓜！

措施

总结

评论