暂无图片
暂无图片
22
暂无图片
暂无图片
暂无图片

记录一起mysql坏块故障处理过程

原创 金同学 2024-11-04
1539

mysql对待坏块问题是比较宽松的,损坏的page较小时,可以正常登录数据库,然后通过相关视图定位问题。如果无法启动实例时,可指定参数innodb_force_recovery启库。

问题描述

今天分析的案例中,错误日志显示有索引损坏,然后mysql实例不定期重启。
以下是我摘出来的部分error.log

2024-11-04T10:15:35.076699+08:00 116 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=1921, page number=35458]. You may have to recover from a backup.

InnoDB: End of page dump
InnoDB: Page may be an index page where index id is 2949
2024-11-04T10:13:08.686097+08:00 142 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=1921, page number=35458]. You may have to recover from a backup.

len 16384; hex 96f1ba7a00008a8200008a8100008a83000000c22bf931b245bf00000000000000000000078100133f5a804a000000003e96000500000048000000000000000000000000000000000b8500000000000000000000000000000000000000000100020025696e66696d756d0005000b000073757072656d756d100b0b360c0c090b20b80000001000da80000000004393f4000017bf11db6a000089f703f8623934343439303939313537343231396166613538366630363633663061663258554a49414e4a554e3236e8aeb8e5bbbae5869b435230303731303030303030435230303732303030333531e58d8ee8a5bfe5a4a7e58cba2de8a5bfe5ae89e58cbae59f9f2de8a5bfe5ae89e78783e6b0942de8bf90e8a18ce7aea1e79086e983a881313337373230383838333458554a49414e4a554e323666cadb513937323230393635334071712e636f6d8000000166cadb5116070b560c0c060720380000001800f980000000004393f5000017bf11

- 19803288300ZHOUHUIWENf   huijiacong@qq.com    f      ?     8           C        m  } )x270d7d1535ae4464ace1ada92d4a85eaLIZHENG281      CR0051000001CR0051022447   - 18939599730LIZHENG281f   254027296@qq.com    f   f 0F   M     8           C        n  }   270d7d1535ae4464ace1ada92d4a85eaNIEGUOKUAN         CR0061000001CR0062005114   _ 13660917778NIEGUOKUANf  `13660917778@139.com    f  `f L    ,                 Ci  /:270d7d1535ae4464ace1ada92d4a85eaRUXIAODONG         CR0031000001CR0032000164     17693821176RUXIAODONGf  qruxiaodong@crnewenergy.com.cn    f  q   6 Cj  802b51828d934a9b81ce06ce27aa1137XIAORUIXUE         CR0071000000CR00710022024-11-04T10:13:08.755755+08:00 6 [ERROR] Got error 180 when reading table './ehs_edu_prd_new/online_training_inner_user'

2024-11-04T10:13:08.780209+08:00 142 [ERROR] [FATAL] InnoDB: Aborting because of a corrupt database page in the system tablespace. Or,  there was a failure in tagging the tablespace  as corrupt.
2024-11-04 10:13:08 0x7f9054efd700  InnoDB: Assertion failure in thread 140257877022464 in file ut0ut.cc line 921
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:13:08 UTC - mysqld got signal 6 ;

从日志找到这几个关键词,下文定位问题时会用到。

  • page id: space=1921,
  • page number=35458,
  • index id is 2949,
  • table ‘./ehs_edu_prd_new/online_training_inner_user’

应用程序中的报错信息
image.png
综合判断,大概率ehs_edu_prd_new.online_training_inner_user表上的索引文件发生损坏。

定位损坏的page

1、使用innochecksum扫描,发现文件已经损坏,无法获取更多信息。

# 建议把数据文件copy到其他目录扫描
cp online_training_inner_user.ibd online_training_inner_user_bak.ibd
# 扫描发现page 35458已经损坏
innochecksum --page-type-summary online_training_inner_user_bak.ibd
Fail: page 35458 invalid
Exceeded the maximum allowed checksum mismatch count::0

2、查看缓存数据页信息
通过 space id = 1921 查看数据页信息,可以看到space id都在online_training_inner_user表上,但是这里打印了所有索引,无法定位损坏文件在哪个索引上。
注意:库比较大时,线上访问innodb_buffer_page很慢,不太建议使用。

root@localhost: 10:50:  [ehs_edu_prd_new]> select * from information_schema.innodb_buffer_page where space=1921 limit 10\G;
Current database: ehs_edu_prd_new

*************************** 1. row ***************************
            POOL_ID: 0
           BLOCK_ID: 137
              SPACE: 1921
        PAGE_NUMBER: 69604
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 1
          FIX_COUNT: 0
          IS_HASHED: NO
NEWEST_MODIFICATION: 840986440969
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822990324
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: idx_org_code
     NUMBER_RECORDS: 369
          DATA_SIZE: 9963
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: YES
    FREE_PAGE_CLOCK: 0
*************************** 2. row ***************************
            POOL_ID: 0
           BLOCK_ID: 151
              SPACE: 1921
        PAGE_NUMBER: 52722
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 1
          FIX_COUNT: 0
          IS_HASHED: NO
NEWEST_MODIFICATION: 840985996386
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822990362
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: un_index_training_inner_user
     NUMBER_RECORDS: 147
          DATA_SIZE: 8647
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: YES
    FREE_PAGE_CLOCK: 0
*************************** 3. row ***************************
            POOL_ID: 0
           BLOCK_ID: 156
              SPACE: 1921
        PAGE_NUMBER: 59384
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 1
          FIX_COUNT: 0
          IS_HASHED: NO
NEWEST_MODIFICATION: 840986177184
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822990379
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: un_index_training_inner_user
     NUMBER_RECORDS: 246
          DATA_SIZE: 14728
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: YES
    FREE_PAGE_CLOCK: 0
*************************** 4. row ***************************
            POOL_ID: 0
           BLOCK_ID: 163
              SPACE: 1921
        PAGE_NUMBER: 65001
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 1
          FIX_COUNT: 0
          IS_HASHED: NO
NEWEST_MODIFICATION: 840985911211
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822990391
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: idx_org_code
     NUMBER_RECORDS: 392
          DATA_SIZE: 10584
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: YES
    FREE_PAGE_CLOCK: 0
*************************** 5. row ***************************
            POOL_ID: 0
           BLOCK_ID: 171
              SPACE: 1921
        PAGE_NUMBER: 45006
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 1
          FIX_COUNT: 0
          IS_HASHED: NO
NEWEST_MODIFICATION: 840986203820
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822990419
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: idx_user_code_learning_status
     NUMBER_RECORDS: 1110
          DATA_SIZE: 15540
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: YES
    FREE_PAGE_CLOCK: 0
*************************** 8. row ***************************
            POOL_ID: 0
           BLOCK_ID: 378
              SPACE: 1921
        PAGE_NUMBER: 34759
          PAGE_TYPE: INDEX
         FLUSH_TYPE: 0
          FIX_COUNT: 0
          IS_HASHED: YES
NEWEST_MODIFICATION: 0
OLDEST_MODIFICATION: 0
        ACCESS_TIME: 3822992063
         TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
         INDEX_NAME: PRIMARY
     NUMBER_RECORDS: 73
          DATA_SIZE: 16090
    COMPRESSED_SIZE: 0
         PAGE_STATE: FILE_PAGE
             IO_FIX: IO_NONE
             IS_OLD: NO
    FREE_PAGE_CLOCK: 0

3、系统视图INNODB_SYS_TABLES 和 INNODB_SYS_INDEXES记录了innodb引擎中所有表和索引信息,这里我们用 index_id = 2949 查看。
如下,发现损坏位置在主键上,而mysql的主键是聚簇索引,它记录了真实数据,无法直接重建主键索引,所以只能通过备份文件恢复,或者mysqldump逻辑导出导入的方式修复了。

root@localhost:  [information_schema]> select * from information_schema.INNODB_SYS_INDEXES where INDEX_ID=2949;
+----------+---------+----------+------+----------+---------+-------+-----------------+
| INDEX_ID | NAME    | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
+----------+---------+----------+------+----------+---------+-------+-----------------+
|     2949 | PRIMARY |     1929 |    3 |        1 |       3 |  1921 |              50 |
+----------+---------+----------+------+----------+---------+-------+-----------------+
1 row in set (0.00 sec)

另外也可以使用开源小工具寻找坏块位置,这里分享八怪开源的innblock。
innblock下载链接

#先赋予可执行权限
[root@ehs-new-mysql-prd-11 data]# chmod +x innblock 

#查看帮助
[root@ehs-new-mysql-prd-11 data]# ./innblock 
----------------------------------------------------------------------------------------------------
[Author]:gaopeng [Blog]:blog.itpub.net/7728585/abstract/1/ [QQ]:22389860
[Review]:yejinrong@zhishutang [Blog]:imysql.com [QQ]:4700963
-------USAGE:../innblock Datafile [scan/pageno] Blocksize 
[Datafile]:innodb data file!
[scan]:physical scan data file to find index level and index block no
[pageno]:which block you will parse
[Blocksize](KB):block size of KB general is 16k only 4k/8k/16k/32k 
----------------------------------------------------------------------------------------------------
File: main.cpp, Line: 00050: USAGE ERROR

#扫描page number=35458出现报错,跟innochecksum结果相似,文件坏坏后无法扫描了,我们这最近的一个page number扫描(35459)。

# 建议把数据文件copy到其他目录扫描
cp online_training_inner_user.ibd online_training_inner_user_bak.ibd

./innblock online_training_inner_user_bak.ibd 35459 16 > block.log

查看扫描结果block.log
image.png

坏块修复

  • 如果损坏page在普通二级索引上,可以直接重建索引;
  • 如果损坏page在主键上,可使用下面两种方案恢复。

第1种:单实例架构逻辑导出导入

1、先停业务
2、使用mysqldump导出该表
3、备份原表
4、覆盖导入
5、如果逻辑恢复失败时,就只能用备份文件或者innodb_force_recovery恢复了。

第2种:在主从高可用架构了,可以先做主库切换,然后在从库修复

1、主库停业务
2、保证主库GTID不在变动
3、确保从库GTID和主库一致,记录GTID信息,然后关闭主从stop slave
4、主库导出单表并且带GTID,这个GTID和第三步的GTID对比相同
5、从库reset master 清理GTID
6、从库导入数据,导入数据自动设置GTID,导入完成后查看GTID,确认GTID和导出文件以及停止从库前的GTID一致,这样保证数据完整。
7、从库重做主从。
最后修改时间:2024-12-18 09:59:04
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论