暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

MySQL数据库阻塞定位

原创 谭磊Terry 恩墨学院 2022-06-08
1854

数据库执行操作查询

过滤Sleep及未选择数据库的线程并以时间字段排序,取前十条记录

show full processlist;
SELECT * from  INFORMATION_SCHEMA.PROCESSLIST WHERE DB IS NOT NULL AND COMMAND != 'Sleep'  ORDER BY TIME DESC LIMIT 10;

 

查找MySQL运行的线程及其对应的OS线程ID:

select thread_id,thread_os_id,name from performance_schema.threads;

 

大事务查询

  • 若有长时间未完成的事务,可能会导致:
  1. undo不能被及时purge,undo表空间不断增长;
  2. 持有行锁,其他事务被阻塞。 应该及时提交或回滚这些事务,或者直接kill释放之。
select * FROM INFORMATION_SCHEMA.INNODB_TRX;

## 事务开启后,超过5s未提交的用户连接
SELECT trx_mysql_thread_id AS PROCESSLIST_ID,
       NOW(),
       TRX_STARTED,
       TO_SECONDS(now())-TO_SECONDS(trx_started) AS TRX_LAST_TIME ,
       USER,
       HOST,
       DB,
       TRX_QUERY
FROM INFORMATION_SCHEMA.INNODB_TRX trx
JOIN INFORMATION_SCHEMA.processlist pcl ON trx.trx_mysql_thread_id=pcl.id
WHERE trx_mysql_thread_id != connection_id()
  AND TO_SECONDS(now())-TO_SECONDS(trx_started) >= 5 ;
  
+----------------+---------------------+---------------------+---------------+------+-----------------+------+-----------+
| PROCESSLIST_ID | NOW()               | TRX_STARTED         | TRX_LAST_TIME | User | Host            | DB   | TRX_QUERY |
+----------------+---------------------+---------------------+---------------+------+-----------------+------+-----------+
|             24 | 2019-12-16 02:49:52 | 2019-12-16 02:41:15 |           517 | root | 127.0.0.1:58682 | db   | NULL      |
+----------------+---------------------+---------------------+---------------+------+-----------------+------+-----------+
1 row in set (0.01 sec)

 

锁等待

查看锁等待相关的阻塞线程、被阻塞线程信息及相关用户、IP、PORT

// 等待线程ID,等待线程SQL,持续时长,阻塞线程ID,阻塞线程状态,阻塞线程正在执行SQL,阻塞线程最后执行SQL
SELECT r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query,
concat(timestampdiff(SECOND,r.trx_wait_started,CURRENT_TIMESTAMP()),'s') AS duration,
b.trx_mysql_thread_id blocking_thread,t.processlist_command state,b.trx_query blocking_current_query,e.sql_text blocking_last_query
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id
JOIN performance_schema.threads t on t.processlist_id = b.trx_mysql_thread_id
JOIN performance_schema.events_statements_current e USING(thread_id);

或者
SELECT locked_table,
       locked_index,
       locked_type,
       blocking_pid,
       concat(T2.USER,'@',T2.HOST) AS "blocking(user@ip:port)",
       blocking_lock_mode,
       blocking_trx_rows_modified,
       waiting_pid,
       concat(T3.USER,'@',T3.HOST) AS "waiting(user@ip:port)",
       waiting_lock_mode,
       waiting_trx_rows_modified,
       wait_age_secs,
       waiting_query
FROM sys.x$innodb_lock_waits T1
LEFT JOIN INFORMATION_SCHEMA.processlist T2 ON T1.blocking_pid=T2.ID
LEFT JOIN INFORMATION_SCHEMA.processlist T3 ON T3.ID=T1.waiting_pid;

或者
SELECT
	t2.db,
    t2.state,
	t1.trx_mysql_thread_id,
	t1.trx_started,
	t1.trx_weight,
	t1.trx_query
FROM
	information_schema.INNODB_TRX t1,
	information_schema.`PROCESSLIST` t2
WHERE
	t1.trx_mysql_thread_id = t2.ID
ORDER BY
	t1.trx_started;

/*innodb 行锁等待脚本*/

SELECT r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query,

concat(timestampdiff(SECOND,r.trx_wait_started,CURRENT_TIMESTAMP()),'s') AS duration,

b.trx_mysql_thread_id blocking_thread,t.processlist_command state,b.trx_query blocking_current_query,e.sql_text blocking_last_query

FROM information_schema.innodb_lock_waits w

JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id

JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id

JOIN performance_schema.threads t on t.processlist_id = b.trx_mysql_thread_id

JOIN performance_schema.events_statements_current e USING(thread_id);


或
show open tables where in_use>0;

 

元数据锁

## MySQL 5.7 开启元数据锁追踪,以便追踪定位元数据锁相关的阻塞问题,当前生产已开启相关配置
// 临时开启,动态生效
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME ='global_instrumentation';
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES'
WHERE NAME ='wait/lock/metadata/sql/mdl';
// 配置文件中添加,重启生效
performance-schema-instrument = wait/lock/metadata/sql/mdl=ON

 

场景1:杀掉持有MDL锁的会话,使DDL语句顺利执行。

  • DDL语句被阻塞通常因为存在获取资源后未及时提交释放的长事务。因此,查找kill掉事务运行时间大于DDL运行时间的会话即可使DDL语句顺利下发,SQL语句如下:
// 查找事务运行时间 >= DDL等待时间的线程
SELECT trx_mysql_thread_id AS PROCESSLIST_ID,
       NOW(),
       TRX_STARTED,
       TO_SECONDS(now())-TO_SECONDS(trx_started) AS TRX_LAST_TIME ,
       USER,
       HOST,
       DB,
       TRX_QUERY
FROM INFORMATION_SCHEMA.INNODB_TRX trx
JOIN INFORMATION_SCHEMA.processlist pcl ON trx.trx_mysql_thread_id=pcl.id
WHERE trx_mysql_thread_id != connection_id()
  AND TO_SECONDS(now())-TO_SECONDS(trx_started) >=
    (SELECT MAX(Time)
     FROM INFORMATION_SCHEMA.processlist
     WHERE STATE='Waiting for table metadata lock'
       AND INFO LIKE 'alter%table%' OR INFO LIKE 'truncate%table%') ;
// kill掉长事务,释放持有的MDL资源,xx为trx_mysql_thread_id
kill xx;   

注意:因 MySQL 元数据信息记录有限,此处可能误杀无辜长事务,且误杀无法完全避免。

  • 当kill掉阻塞源后,可能存在DDL语句与被阻塞的SQL语句同时加锁的情况,此时会出现事务开始时间等于DDL开始时间连接,此类事务也需kill
//查找事务开始时间 = DDL语句事务开始时间的线程
SELECT trx_mysql_thread_id AS PROCESSLIST_ID,
       NOW(),
       TRX_STARTED,
       TO_SECONDS(now())-TO_SECONDS(trx_started) AS TRX_LAST_TIME ,
       USER,
       HOST,
       DB,
       TRX_QUERY
FROM INFORMATION_SCHEMA.INNODB_TRX trx
JOIN INFORMATION_SCHEMA.processlist pcl ON trx.trx_mysql_thread_id=pcl.id
WHERE trx_mysql_thread_id != connection_id()
  AND trx_started =
    (SELECT MIN(trx_started)
     FROM INFORMATION_SCHEMA.INNODB_TRX
     GROUP BY trx_started HAVING count(trx_started)>=2)
  AND TRX_QUERY NOT LIKE 'alter%table%'
  OR TRX_QUERY IS NULL;
//杀掉阻塞源
kill xx;  xx为trx_mysql_thread_id

场景2:kill掉下发DDL语句的用户连接,取消DDL语句下发,保障业务不被阻塞。

// 查找DDL语句所在用户连接
SELECT *
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE INFO LIKE 'ALTER%TABLE%';

+-----+------+-----------+------+---------+------+---------------------------------+----------------------------------+
| ID  | USER | HOST      | DB   | COMMAND | TIME | STATE                           | INFO                             |
+-----+------+-----------+------+---------+------+---------------------------------+----------------------------------+
| 254 | root | localhost | NULL | Query   |  730 | Waiting for table metadata lock | alter table db.t1 add index (id) |
+-----+------+-----------+------+---------+------+---------------------------------+----------------------------------+
1 row in set (0.00 sec)

// 杀掉DDL语句所在用户连接
kill 254;

 

场景3:遇到DDL变更的时候发生阻塞

问题描述:添加字段、添加索引等DDL语句时候会被阻塞,show processlist 会看到显示 Waiting for table metadata lock. 后续的对这些表的select也会被阻塞

分析:autocommit=0,怀疑有未提交事务,导致产生了元数据锁。

解决:

  1. 如果打开了performance_schema 监控,通过语句定位未提交事务:
SELECT
locked_schema,
locked_table,
locked_type,
waiting_processlist_id,
waiting_age,
waiting_query,
waiting_state,
blocking_processlist_id,
blocking_age,
substring_index(sql_text,“transaction_begin;” ,-1) AS blocking_query,
sql_kill_blocking_connection
FROM
(
SELECT
b.OWNER_THREAD_ID AS granted_thread_id,
a.OBJECT_SCHEMA AS locked_schema,
a.OBJECT_NAME AS locked_table,
“Metadata Lock” AS locked_type,
c.PROCESSLIST_ID AS waiting_processlist_id,
c.PROCESSLIST_TIME AS waiting_age,
c.PROCESSLIST_INFO AS waiting_query,
c.PROCESSLIST_STATE AS waiting_state,
d.PROCESSLIST_ID AS blocking_processlist_id,
d.PROCESSLIST_TIME AS blocking_age,
d.PROCESSLIST_INFO AS blocking_query,
concat('KILL ', d.PROCESSLIST_ID) AS sql_kill_blocking_connection
FROM
performance_schema.metadata_locks a
JOIN performance_schema.metadata_locks b ON a.OBJECT_SCHEMA = b.OBJECT_SCHEMA
AND a.OBJECT_NAME = b.OBJECT_NAME
AND a.lock_status = ‘PENDING’
AND b.lock_status = ‘GRANTED’
AND a.OWNER_THREAD_ID <> b.OWNER_THREAD_ID
AND a.lock_type = ‘EXCLUSIVE’
JOIN performance_schema.threads c ON a.OWNER_THREAD_ID = c.THREAD_ID
JOIN performance_schema.threads d ON b.OWNER_THREAD_ID = d.THREAD_ID
) t1,
(
SELECT
thread_id,
group_concat( CASE WHEN EVENT_NAME = ‘statement/sql/begin’ THEN “transaction_begin” ELSE sql_text END ORDER BY event_id SEPARATOR “;” ) AS sql_text
FROM
performance_schema.events_statements_history
GROUP BY thread_id
) t2
WHERE
t1.granted_thread_id = t2.thread_id
  1. 杀进程
kill blocking_processlist_id
  1. 没有打开wait/lock/metadata/sql/mdl情况下,针对sleep进程执行kill
SELECT concat(‘kill ‘,processlist_id,’;’)
FROM performance_schema.events_statements_current a JOIN performance_schema.threads b USING(thread_id)
JOIN information_schema.processlist c ON b.processlist_id = c.id
WHERE a.sql_text NOT LIKE ‘%performance%’ and command=‘sleep’ order by c.time desc;

批量更新数据卡住

问题描述:批量更新很慢,没添加索引,添加索引被阻塞

分析:事务不自动提交容易造成元数据锁冲突

解决:执行下列SQL语句

SELECT concat(‘kill ‘,processlist_id,’;’)
FROM performance_schema.events_statements_current a JOIN performance_schema.threads b USING(thread_id)
JOIN information_schema.processlist c ON b.processlist_id = c.id
WHERE a.sql_text NOT LIKE ‘%performance%’ and command=‘sleep’ order by c.time desc;

杀进程

全局读锁

PERFORMANCE_SCHEMA.METADATA_LOCKS 表 LOCK_DURATION 列为 EXPLICIT 状态表示 FTWRL 语句添加,OBJECT_TYPE 出现 COMMIT 状态表示已经加锁成功

  • 场景 1:杀掉添加 FTWRL 的会话,恢复业务运行
SELECT processlist_id,
       mdl.OBJECT_TYPE,
       OBJECT_SCHEMA,
       OBJECT_NAME,
       LOCK_TYPE,
       LOCK_DURATION,
       LOCK_STATUS
FROM performance_schema.metadata_locks mdl
INNER JOIN performance_schema.threads thd ON mdl.owner_thread_id = thd.thread_id
AND processlist_id <> connection_id()
AND LOCK_DURATION='EXPLICIT';

+----------------+-------------+---------------+-------------+-----------+---------------+-------------+
| processlist_id | OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS |
+----------------+-------------+---------------+-------------+-----------+---------------+-------------+
|            231 | GLOBAL      | NULL          | NULL        | SHARED    | EXPLICIT      | GRANTED     |
|            231 | COMMIT      | NULL          | NULL        | SHARED    | EXPLICIT      | GRANTED     |
+----------------+-------------+---------------+-------------+-----------+---------------+-------------+
2 rows in set (0.00 sec)

// 杀掉添加FTWRL的用户连接
kill 231;
  • 场景 2:杀掉语句执行时间大于 FTWRL 执行时间的线程,确保 FTWRL 下发成功
SELECT T2.THREAD_ID,
       T1.ID AS PROCESSLIST_ID,
       T1.User,
       T1.Host,
       T1.db,
       T1.Time,
       T1.State,
       T1.Info,
       T3.TRX_STARTED,
       TO_SECONDS(now())-TO_SECONDS(trx_started) AS TRX_LAST_TIME
FROM INFORMATION_SCHEMA.processlist T1
LEFT JOIN PERFORMANCE_SCHEMA.THREADS T2 ON T1.ID=T2.PROCESSLIST_ID
LEFT JOIN INFORMATION_SCHEMA.INNODB_TRX T3 ON T1.id=T3.trx_mysql_thread_id
WHERE T1.TIME >=
    (SELECT MAX(Time)
     FROM INFORMATION_SCHEMA.processlist
     WHERE INFO LIKE 'flush%table%with%read%lock')
  AND Info IS NOT NULL;

+-----------+----------------+------+-------------------+------+------+-------------------------+---------------------------------------------+---------------------+---------------+
| THREAD_ID | PROCESSLIST_ID | User | Host              | db   | Time | State                   | Info                                        | TRX_STARTED         | TRX_LAST_TIME |
+-----------+----------------+------+-------------------+------+------+-------------------------+---------------------------------------------+---------------------+---------------+
|       284 |            246 | root | localhost         | NULL |  364 | User sleep              | select * from db.t1 where sleep(1000000000) | 2019-12-23 14:57:23 |           364 |
|       286 |            248 | root | 10.211.55.2:55435 | NULL |  232 | Waiting for table flush | flush table with read lock                  | NULL                |          NULL |
+-----------+----------------+------+-------------------+------+------+-------------------------+---------------------------------------------+---------------------+---------------+
2 rows in set (0.00 sec)
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论