暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

[MYSQL] show engine innodb status中的死锁 分析

原创 大大刺猬 2024-08-26
799

导读

很久以前(也才2年)写过一个解析innodb_status的脚本. 看起来像那么回事, 其实就是做了个翻译和总结.
比如:

(venv) 10:15:56 [root@ddcw21 innodb_status]#python innodb_status.py -h127.0.0.1 -P3314 -p123456 说明: 采集时间: 2024-08-26 10:15:57 下面涉及到的 每秒平均值 计算时间均为最近 27 秒内 master线程: 系统繁忙程度(越大越繁忙): 0.07 % 日志写入和刷新次数: 0 SEMAPHORES信号量: rw_s_spin_wait_count 0 rw_s_spin_round_count 0 rw_s_os_wait_count 0 rw_x_spin_wait_count 0 rw_x_spin_round_count 0 rw_x_os_wait_count 0 rw_sx_spin_wait_count 0 rw_sx_spin_round_count 0 rw_sx_os_wait_count 0 每次空转等待的锁: rw_s:0.00 rw_x:0.00 rw_sx:0.00 死锁(最近一条) 产生死锁的时间: 2024-08-26 事务ID:51399739 锁类型:X thread_id:9 578 localhost 127.0.0.1 root updating SQL如下: update db1.sbtest1 set c='session 1' where id = 110011 事务ID:51399740 锁类型:X thread_id:10 581 localhost 127.0.0.1 root updating SQL如下: update db1.sbtest1 set c='session 1' where id = 110010 回滚事务: 51399740 事务汇总信息 max_trx_id : 51399742 min_trx_id : 51399742 max_undo_id: 0 purge线程状态: running but idle undo包含的事务数: 0 事务ID:51399739 事务状态:ACTIVE 1606 sec 锁:3 堆大小:1128 锁行数:2 事务中修改或插入的行数:2 MYSQL_PROCESS_ID:9 文件IO Pending normal 异步IO READ (对应read thread) : [0, 0, 0, 0, 0, 0, 0, 0] Pending normal 异步IO WRITE(对应write thread): [0, 0, 0, 0] 挂起(pending)的redo log flush: 0 挂起(pending)的tablespace flush: 0 OS总读次数: 2082 速度: 0.00 次/秒. 平均每次读 0 字节 OS总写次数: 445 速度: 0.00 次/秒. OS总flush次数: 160 速度: 0.00 次/秒. insert/change buffer和自适应hash索引 已合并页的数量: 1 页. ibuf空闲列表长度: 3078 页. ibuf大小: 3080 页. 合并插入次数: 2 合并操作次数: insert buffer: 0 delete buffer: 2 purge buffer: 0 无需合并操作的次数: insert buffer: 0 delete buffer: 0 purge buffer: 0 使用hash索引的查询 0.00次/秒 未使用hash使用的查询 0.00次/秒 自适应hash索引使用率0.0% 日志信息(redo) 最新产生的LSN: 112989067081 已刷盘的LSN: 112989067081 最老的LSN: 112989067081 最新检查点LSN: 112989067081 redo已完成的IO次数: 67 速度:0.00次/秒 BUFFER POOL AND MEMORY(不含具体实例的,只含汇总的) 总内存: 0 字节 (0.0 GB) 系统(字典)使用: 2052830 字节 buffer pool: 8192 页 free buffer: 5926 页 LRU : 2193 页 old LRU : 829 页 脏页(flush list) : 0 页 等待读入的页(pending read) : 0 页 等待的写(pending write) : LRU: 0 页. flush_list(等待刷新的脏页): 0 页. 单页: 0 页 LRU made young(LRU中移动到前部的页数,就是经常使用的页) 0 页(速度:0.00/s), non-young 0 页(速度:0.00/s) 从磁盘读取的页: 2050(0.00/s) 在内存中创建的页(无数据): 143(0.00/s) 写入磁盘的页: 270(0.00/s) 缓存命中率:数据库太闲,无此数据. 预读速度: 0.00/s (因未被访问)驱除速度: 0.00/s 随机预读速度: 0.00/s 行操作ROW OPERATIONS read view: 0 主进程ID: 3517 (sleeping) 插入行数: 0(0.00/s) 更新行数: 2(0.00/s) 删除行数: 3(0.00/s) 读行数: 17(0.00/s)

Total large memory allocated 为0是官方的BUG,不是脚本的BUG. 好像是从8.0.28开始的, 到8.0.37都还没修复…

其中死锁这一块显示得比较简单, 就列出相关连接和相关SQL(通常就够了). 但随着我们技术的提升, 就想了解得更清楚. 尤其是我们学完ibd和undo之后, 看这死锁信息就更加眼熟了.
image.png

死锁分析

对于死锁, 官方的描述是: A deadlock is a situation in which multiple transactions are unable to proceed because each transaction holds a lock that is needed by another one. Because all transactions involved are waiting for the same resource to become available, none of them ever releases the lock it holds.

总结就是: 死锁是多个事务相互等待锁导致无法继续
image.png

如果配置了参数innodb_deadlock_detect(默认启用), innodb会自动处理死锁(回滚产生死锁的事务,通常是后者)

我们可以通过show engine innodb status查看死锁, 或者启用参数innodb_print_all_deadlocks, 讲死锁信息打印到error log中.

构造死锁

我们使用2个事务来模拟死锁. 数据我是使用sysbench提前建好的.

-- session 1 (锁110010) begin; delete from db1.sbtest1 where id=110010; -- session 2 (锁110011) begin; delete from db1.sbtest1 where id=110011; -- session 1 (会等待session 2释放锁) update db1.sbtest1 set c='session 1' where id = 110011; -- session 2 (造成了死锁, 会被回滚) update db1.sbtest1 set c='session 1' where id = 110010; -- session 1 (update执行成功)

接下来就是枯燥的分析了

死锁分析

然后我们先使用show engine innodb status\G查看死锁信息. 得到如下信息

------------------------ LATEST DETECTED DEADLOCK ------------------------ 2024-08-26 10:34:35 139905796822784 *** (1) TRANSACTION: TRANSACTION 51399743, ACTIVE 69 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1 MySQL thread id 9, OS thread handle 139906147886848, query id 598 localhost 127.0.0.1 root updating update db1.sbtest1 set c='session 1' where id = 110011 *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32 0: len 4; hex 8001adba; asc ;; 1: len 6; hex 000003104c3f; asc L?;; 2: len 7; hex 020000366a0281; asc 6j ;; 3: len 4; hex 800f6578; asc ex;; 4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes); 5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes); *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32 0: len 4; hex 8001adbb; asc ;; 1: len 6; hex 000003104c46; asc LF;; 2: len 7; hex 01000040272ff2; asc @'/ ;; 3: len 4; hex 800fabf1; asc ;; 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes); 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes); *** (2) TRANSACTION: TRANSACTION 51399750, ACTIVE 11 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1 MySQL thread id 10, OS thread handle 139906146830080, query id 599 localhost 127.0.0.1 root updating update db1.sbtest1 set c='session 1' where id = 110010 *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32 0: len 4; hex 8001adbb; asc ;; 1: len 6; hex 000003104c46; asc LF;; 2: len 7; hex 01000040272ff2; asc @'/ ;; 3: len 4; hex 800fabf1; asc ;; 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes); 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes); *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399750 lock_mode X locks rec but not gap waiting Record lock, heap no 37 PHYSICAL RECORD: n_fields 6; compact format; info bits 32 0: len 4; hex 8001adba; asc ;; 1: len 6; hex 000003104c3f; asc L?;; 2: len 7; hex 020000366a0281; asc 6j ;; 3: len 4; hex 800f6578; asc ex;; 4: len 30; hex 39373838313231323839312d39383931323636313630362d333330373837; asc 97881212891-98912661606-330787; (total 120 bytes); 5: len 30; hex 31323734303538313139302d36303433373433323738372d343232303830; asc 12740581190-60437432787-422080; (total 60 bytes); *** WE ROLL BACK TRANSACTION (2)

通过描述看是回滚了 TRANSACTION (2) (thread id 10, trx id 51399750). 我们在数据库里面查询验证下:

(root@127.0.0.1) [information_schema]> select * from information_schema.processlist where id=10; +----+------+-----------------+------+---------+------+-------+------+ | ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO | +----+------+-----------------+------+---------+------+-------+------+ | 10 | root | localhost:45864 | db1 | Sleep | 1347 | | NULL | +----+------+-----------------+------+---------+------+-------+------+ 1 row in set (0.00 sec)

现在它睡着了(Sleep状态), 我们看下它之前跑了哪些SQL.

(root@127.0.0.1) [performance_schema]> select * from performance_schema.events_statements_current where THREAD_ID in (select THHREAD_ID from performance_schema.threads where PROCESSLIST_ID=10)\G *************************** 1. row *************************** THREAD_ID: 53 EVENT_ID: 295 END_EVENT_ID: 295 EVENT_NAME: statement/sql/update SOURCE: init_net_server_extension.cc:95 TIMER_START: 5208043199351000 TIMER_END: 5208091954758000 TIMER_WAIT: 48755407000 LOCK_TIME: 47960000000 SQL_TEXT: update db1.sbtest1 set c='session 1' where id = 110010 DIGEST: 4ec5ab547e9e3a9577d75340971b5b98e2feb1de6f37b2bd318b99186387c497 DIGEST_TEXT: UPDATE `db1` . `sbtest1` SET `c` = ? WHERE `id` = ? CURRENT_SCHEMA: db1 OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 1213 RETURNED_SQLSTATE: 40001 MESSAGE_TEXT: Deadlock found when trying to get lock; try restarting transaction ERRORS: 1 WARNINGS: 0 ROWS_AFFECTED: 0 ROWS_SENT: 0 ROWS_EXAMINED: 0 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 0 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: 293 NESTING_EVENT_TYPE: TRANSACTION NESTING_EVENT_LEVEL: 0 STATEMENT_ID: 599 CPU_TIME: 0 1 row in set (0.00 sec)

上一条sql是update 然后被回滚了, 再看看之前的历史sql信息呢(方便还原事务逻辑).

(root@127.0.0.1) [performance_schema]> select SQL_TEXT from performance_schema.events_statements_history where THREAD_ID in (seelect THREAD_ID from performance_schema.threads where PROCESSLIST_ID=10); +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | SQL_TEXT | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | update db1.sbtest1 set c='session 1' where id = 110010 | | rollback | | delete from db1.sbtest1 where id=110011 | | INSERT INTO `db1`.`sbtest1` VALUES (110011, 1027057, '99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267', '02676592319-55855783189-94370134366-73974511206-02616100121') | | select * from db1.sbtest1 limit 2 | | begin | | delete from db1.sbtest1 where id=110011 | | update db1.sbtest1 set c='session 1' where id = 110010 | | begin | | delete from db1.sbtest1 where id=110011 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows in set (0.00 sec)

可以看到是做的begin和delete, 就是我们上面模拟的sql. 然后就可以找开发理论了

再上面的insert是我模拟的时候忘记begin了 -_-, 然后使用ibd2sql恢复的数据…

进一步分析死锁

上面那些是我们之前就能分析出来的, 现在我们要来分析点’高级’的东西. 看看HOLDS THE LOCK(S)(我锁了啥)和WAITING FOR THIS LOCK TO BE GRANTED(我要啥)部分. 这两部分的内容实际上是相似的(就数据不同而已). 所以我们就只分析其中一部分即可. 就挑第2部分吧(方便看undo)

*** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 13450 page no 1731 n bits 144 index PRIMARY of table `db1`.`sbtest1` trx id 51399743 lock_mode X locks rec but not gap waiting Record lock, heap no 38 PHYSICAL RECORD: n_fields 6; compact format; info bits 32 0: len 4; hex 8001adbb; asc ;; 1: len 6; hex 000003104c46; asc LF;; 2: len 7; hex 01000040272ff2; asc @'/ ;; 3: len 4; hex 800fabf1; asc ;; 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434; asc 99419809132-69853229692-218444; (total 120 bytes); 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031; asc 02676592319-55855783189-943701; (total 60 bytes);

我们可以得到表的 SPACE_ID=13450 对应数据的PAGE_NO=1731. 是X锁. 堆号:38 字段数量:6 (含trx_id&rollptr). 行格式: compact

然后是16进制的数据, 我们使用python来解析.比如, 第一个字段主键是int.

import struct hdata = '8001adbb' bdata = bytes.fromhex(hdata) is_unsigned = False _t = struct.unpack('>L',bdata)[0] n = 4 # 4字节 _s = n*8 - 1 data = (_t&((1<<_s)-1))-2**_s if _t < 2**_s and not is_unsigned else (_t&((1<<_s)-1)) print(data)

无符号类型(trx_id,rollptr)直接读就行. 比如: int.from_bytes(bytes.fromhex(‘000003104c46’),‘big’)

image.png
各数据类型的解析方式可以查看: https://github.com/ddcw/ibd2sql/blob/main/docs/README_DEV.md

于是我们就得到了

0: len 4; hex 8001adbb 主键:110011 1: len 6; hex 000003104c46 事务ID:51399750 (行事务,即锁的事务, 不一定是当前事务) 2: len 7; hex 01000040272ff2 回滚指针:281476053020658 3: len 4; hex 800fabf1 第一个非主键值: 1027057 4: len 30; hex 39393431393830393133322d36393835333232393639322d323138343434 第二个非主键值:99419809132-69853229692-218444 5: len 30; hex 30323637363539323331392d35353835353738333138392d393433373031 第三个非主键值: 02676592319-55855783189-943701

我们先在数据库里面根据space_id查询出对应的Ibd文件.(虽然看表名字就知道那张表了)

(root@127.0.0.1) [information_schema]> select * from information_schema.INNODB_TABLES where SPACE=13450\G *************************** 1. row *************************** TABLE_ID: 17484 NAME: db1/sbtest1 FLAG: 33 N_COLS: 7 SPACE: 13450 ROW_FORMAT: Dynamic ZIP_PAGE_SIZE: 0 SPACE_TYPE: Single INSTANT_COLS: 0 1 row in set (22.40 sec)

然后我们使用ibd2sql解析对应的page

(venv) 13:33:46 [root@ddcw21 ibd2sql]#python main.py /data/mysql_3314/mysqldata/db1/sbtest1.ibd --page-start=1731 --limit 1 --sql --debug | grep -C 20 '110011' [2024-08-26 13:34:16] [DEBUG] NO:3 READ RECORD HEADER (5 bytes) _offset:7615 offset:7615 START [2024-08-26 13:34:16] [DEBUG] READ RECORD HEADER (5 bytes) _offset:7610 offset:7615 FINISH [2024-08-26 13:34:16] [DEBUG] PAGE NO : 1731 [2024-08-26 13:34:16] [DEBUG] READ ROW NO : 1 CURRENT_OFFSET:7615 [2024-08-26 13:34:16] [DEBUG] REC INSTANT : False [2024-08-26 13:34:16] [DEBUG] REC DELETED : False [2024-08-26 13:34:16] [DEBUG] REC MIN_REC : False [2024-08-26 13:34:16] [DEBUG] REC OWNED : 0 [2024-08-26 13:34:16] [DEBUG] REC HEAP_NO : 304 [2024-08-26 13:34:16] [DEBUG] REC TYPE : 0 [2024-08-26 13:34:16] [DEBUG] REC NEXT : 208 [2024-08-26 13:34:16] [DEBUG] INSTANT FLAG : False [2024-08-26 13:34:16] [DEBUG] ROW VERSION FLAG : False [2024-08-26 13:34:16] [DEBUG] 20 bytes ON BOTH SIDES OF RECORD, b'69993-79145826637 <x', b'\x00\x010\x00\xd0\x80\x01\xad\xbb\x00\x00\x03\x10L?\x02\x00\x006j' [2024-08-26 13:34:16] [DEBUG] READ NULL BITMASK [2024-08-26 13:34:16] [DEBUG] NO NULLABLE FIELD. [2024-08-26 13:34:16] [DEBUG] NULLABLE FILED COUNT: 0 NULLABLE FIELD COUNT(FOR INSTANT):0 [2024-08-26 13:34:16] [DEBUG] NULL BITMASK: COUNT:0 ID: 0 [] [2024-08-26 13:34:16] [DEBUG] READ KEY FILED [2024-08-26 13:34:16] [DEBUG] READ KEY COLNO:1 NAME:id [2024-08-26 13:34:16] [DEBUG] 7615 ----> 7619 data:110011 bdata:b'\x80\x01\xad\xbb' [2024-08-26 13:34:16] [DEBUG] READ KEY NO:1 NAME:id FINISH. VALUES: 110011 [2024-08-26 13:34:16] [DEBUG] READ TRX(6) AND ROLLPTR(7) INFO [2024-08-26 13:34:16] [DEBUG] TRX: 51399743 ROLLPTR: 562950866338476 [2024-08-26 13:34:16] [DEBUG] ROW VERSION : -1 [2024-08-26 13:34:16] [DEBUG] INSTANT FLAG : False [2024-08-26 13:34:16] [DEBUG] ROW VERSION FLAG : False [2024-08-26 13:34:16] [DEBUG] NAME: k VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1 [2024-08-26 13:34:16] [DEBUG] 7632 ----> 7636 data:1027057 bdata:b'\x80\x0f\xab\xf1' [2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ######## [2024-08-26 13:34:16] [DEBUG] NAME: c VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1 [2024-08-26 13:34:16] [DEBUG] 7636 ----> 7756 data:session 1 bdata:b'session 1 ' [2024-08-26 13:34:16] [DEBUG] ######## DDCW FLAG 8 ######## [2024-08-26 13:34:16] [DEBUG] NAME: pad VERSION_ADDED:0 VERSION_DROPED:0 COL_INSTANT:False ROW VERSION:-1 [2024-08-26 13:34:16] [DEBUG] 7756 ----> 7816 data:02676592319-55855783189-94370134366-73974511206-02616100121 bdata:b'02676592319-55855783189-94370134366-73974511206-02616100121 '

我们得到
事务ID:51399743 (session 1)
ROLLPTR: 562950866338476
数据部分也是吻合的(update之后的), 然后我们根据回滚指针查看undo日志. 脚本见: https://github.com/ddcw/ddcw/tree/master/python/undo_reader

(venv) 13:35:53 [root@ddcw21 ei]#python undo_reader.py /data/mysql_3314/mysqldata/undo_002 -r 562950866338476 PAGENO:13930 OFFSET:684 --> 836 rseg_id:2 is_insert:False DATA: b'\\\x00\x01\xc0DL\x00\x00\x03\x10LC\xe0\x81\x00\x00\x17\x8c\x01\x10\x04\x80\x01\xad\xbb\x01\x04x99419809132-69853229692-21844469818-64365421768-18331047661-28311633260-44480639767-40347505668-33492861467-91598865267 '

得到了update之前的数据. 但我们发现只有主键修改前的字段的值(并没有完整的字段)

\x80\x01\xad\xbb 110011

知识都串起来了.

总结

  1. mysql会自动处理死锁
  2. 死锁信息只包含最近的sql, 若要看历史sql(完整事务sql), 得查看performance_schema.events_statements_history
  3. update之前的信息只记录修改部分在undo里面, 不会完整记录整行数据.

参考:
https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论