mysql对待坏块问题是比较宽松的,损坏的page较小时,可以正常登录数据库,然后通过相关视图定位问题。如果无法启动实例时,可指定参数innodb_force_recovery启库。
问题描述
今天分析的案例中,错误日志显示有索引损坏,然后mysql实例不定期重启。
以下是我摘出来的部分error.log
2024-11-04T10:15:35.076699+08:00 116 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=1921, page number=35458]. You may have to recover from a backup.
InnoDB: End of page dump
InnoDB: Page may be an index page where index id is 2949
2024-11-04T10:13:08.686097+08:00 142 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=1921, page number=35458]. You may have to recover from a backup.
len 16384; hex 96f1ba7a00008a8200008a8100008a83000000c22bf931b245bf00000000000000000000078100133f5a804a000000003e96000500000048000000000000000000000000000000000b8500000000000000000000000000000000000000000100020025696e66696d756d0005000b000073757072656d756d100b0b360c0c090b20b80000001000da80000000004393f4000017bf11db6a000089f703f8623934343439303939313537343231396166613538366630363633663061663258554a49414e4a554e3236e8aeb8e5bbbae5869b435230303731303030303030435230303732303030333531e58d8ee8a5bfe5a4a7e58cba2de8a5bfe5ae89e58cbae59f9f2de8a5bfe5ae89e78783e6b0942de8bf90e8a18ce7aea1e79086e983a881313337373230383838333458554a49414e4a554e323666cadb513937323230393635334071712e636f6d8000000166cadb5116070b560c0c060720380000001800f980000000004393f5000017bf11
- 19803288300ZHOUHUIWENf huijiacong@qq.com f ? 8 C m } )x270d7d1535ae4464ace1ada92d4a85eaLIZHENG281 CR0051000001CR0051022447 - 18939599730LIZHENG281f 254027296@qq.com f f 0F M 8 C n } 270d7d1535ae4464ace1ada92d4a85eaNIEGUOKUAN CR0061000001CR0062005114 _ 13660917778NIEGUOKUANf `13660917778@139.com f `f L , Ci /:270d7d1535ae4464ace1ada92d4a85eaRUXIAODONG CR0031000001CR0032000164 17693821176RUXIAODONGf qruxiaodong@crnewenergy.com.cn f q 6 Cj 802b51828d934a9b81ce06ce27aa1137XIAORUIXUE CR0071000000CR00710022024-11-04T10:13:08.755755+08:00 6 [ERROR] Got error 180 when reading table './ehs_edu_prd_new/online_training_inner_user'
2024-11-04T10:13:08.780209+08:00 142 [ERROR] [FATAL] InnoDB: Aborting because of a corrupt database page in the system tablespace. Or, there was a failure in tagging the tablespace as corrupt.
2024-11-04 10:13:08 0x7f9054efd700 InnoDB: Assertion failure in thread 140257877022464 in file ut0ut.cc line 921
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:13:08 UTC - mysqld got signal 6 ;
从日志找到这几个关键词,下文定位问题时会用到。
- page id: space=1921,
- page number=35458,
- index id is 2949,
- table ‘./ehs_edu_prd_new/online_training_inner_user’
应用程序中的报错信息

综合判断,大概率ehs_edu_prd_new.online_training_inner_user表上的索引文件发生损坏。
定位损坏的page
1、使用innochecksum扫描,发现文件已经损坏,无法获取更多信息。
# 建议把数据文件copy到其他目录扫描
cp online_training_inner_user.ibd online_training_inner_user_bak.ibd
# 扫描发现page 35458已经损坏
innochecksum --page-type-summary online_training_inner_user_bak.ibd
Fail: page 35458 invalid
Exceeded the maximum allowed checksum mismatch count::0
2、查看缓存数据页信息
通过 space id = 1921 查看数据页信息,可以看到space id都在online_training_inner_user表上,但是这里打印了所有索引,无法定位损坏文件在哪个索引上。
注意:库比较大时,线上访问innodb_buffer_page很慢,不太建议使用。
root@localhost: 10:50: [ehs_edu_prd_new]> select * from information_schema.innodb_buffer_page where space=1921 limit 10\G;
Current database: ehs_edu_prd_new
*************************** 1. row ***************************
POOL_ID: 0
BLOCK_ID: 137
SPACE: 1921
PAGE_NUMBER: 69604
PAGE_TYPE: INDEX
FLUSH_TYPE: 1
FIX_COUNT: 0
IS_HASHED: NO
NEWEST_MODIFICATION: 840986440969
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822990324
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: idx_org_code
NUMBER_RECORDS: 369
DATA_SIZE: 9963
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: YES
FREE_PAGE_CLOCK: 0
*************************** 2. row ***************************
POOL_ID: 0
BLOCK_ID: 151
SPACE: 1921
PAGE_NUMBER: 52722
PAGE_TYPE: INDEX
FLUSH_TYPE: 1
FIX_COUNT: 0
IS_HASHED: NO
NEWEST_MODIFICATION: 840985996386
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822990362
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: un_index_training_inner_user
NUMBER_RECORDS: 147
DATA_SIZE: 8647
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: YES
FREE_PAGE_CLOCK: 0
*************************** 3. row ***************************
POOL_ID: 0
BLOCK_ID: 156
SPACE: 1921
PAGE_NUMBER: 59384
PAGE_TYPE: INDEX
FLUSH_TYPE: 1
FIX_COUNT: 0
IS_HASHED: NO
NEWEST_MODIFICATION: 840986177184
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822990379
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: un_index_training_inner_user
NUMBER_RECORDS: 246
DATA_SIZE: 14728
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: YES
FREE_PAGE_CLOCK: 0
*************************** 4. row ***************************
POOL_ID: 0
BLOCK_ID: 163
SPACE: 1921
PAGE_NUMBER: 65001
PAGE_TYPE: INDEX
FLUSH_TYPE: 1
FIX_COUNT: 0
IS_HASHED: NO
NEWEST_MODIFICATION: 840985911211
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822990391
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: idx_org_code
NUMBER_RECORDS: 392
DATA_SIZE: 10584
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: YES
FREE_PAGE_CLOCK: 0
*************************** 5. row ***************************
POOL_ID: 0
BLOCK_ID: 171
SPACE: 1921
PAGE_NUMBER: 45006
PAGE_TYPE: INDEX
FLUSH_TYPE: 1
FIX_COUNT: 0
IS_HASHED: NO
NEWEST_MODIFICATION: 840986203820
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822990419
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: idx_user_code_learning_status
NUMBER_RECORDS: 1110
DATA_SIZE: 15540
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: YES
FREE_PAGE_CLOCK: 0
*************************** 8. row ***************************
POOL_ID: 0
BLOCK_ID: 378
SPACE: 1921
PAGE_NUMBER: 34759
PAGE_TYPE: INDEX
FLUSH_TYPE: 0
FIX_COUNT: 0
IS_HASHED: YES
NEWEST_MODIFICATION: 0
OLDEST_MODIFICATION: 0
ACCESS_TIME: 3822992063
TABLE_NAME: `ehs_edu_prd_new`.`online_training_inner_user`
INDEX_NAME: PRIMARY
NUMBER_RECORDS: 73
DATA_SIZE: 16090
COMPRESSED_SIZE: 0
PAGE_STATE: FILE_PAGE
IO_FIX: IO_NONE
IS_OLD: NO
FREE_PAGE_CLOCK: 0
3、系统视图INNODB_SYS_TABLES 和 INNODB_SYS_INDEXES记录了innodb引擎中所有表和索引信息,这里我们用 index_id = 2949 查看。
如下,发现损坏位置在主键上,而mysql的主键是聚簇索引,它记录了真实数据,无法直接重建主键索引,所以只能通过备份文件恢复,或者mysqldump逻辑导出导入的方式修复了。
root@localhost: [information_schema]> select * from information_schema.INNODB_SYS_INDEXES where INDEX_ID=2949;
+----------+---------+----------+------+----------+---------+-------+-----------------+
| INDEX_ID | NAME | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
+----------+---------+----------+------+----------+---------+-------+-----------------+
| 2949 | PRIMARY | 1929 | 3 | 1 | 3 | 1921 | 50 |
+----------+---------+----------+------+----------+---------+-------+-----------------+
1 row in set (0.00 sec)
另外也可以使用开源小工具寻找坏块位置,这里分享八怪开源的innblock。
innblock下载链接
#先赋予可执行权限
[root@ehs-new-mysql-prd-11 data]# chmod +x innblock
#查看帮助
[root@ehs-new-mysql-prd-11 data]# ./innblock
----------------------------------------------------------------------------------------------------
[Author]:gaopeng [Blog]:blog.itpub.net/7728585/abstract/1/ [QQ]:22389860
[Review]:yejinrong@zhishutang [Blog]:imysql.com [QQ]:4700963
-------USAGE:../innblock Datafile [scan/pageno] Blocksize
[Datafile]:innodb data file!
[scan]:physical scan data file to find index level and index block no
[pageno]:which block you will parse
[Blocksize](KB):block size of KB general is 16k only 4k/8k/16k/32k
----------------------------------------------------------------------------------------------------
File: main.cpp, Line: 00050: USAGE ERROR
#扫描page number=35458出现报错,跟innochecksum结果相似,文件坏坏后无法扫描了,我们这最近的一个page number扫描(35459)。
# 建议把数据文件copy到其他目录扫描
cp online_training_inner_user.ibd online_training_inner_user_bak.ibd
./innblock online_training_inner_user_bak.ibd 35459 16 > block.log
查看扫描结果block.log

坏块修复
- 如果损坏page在普通二级索引上,可以直接重建索引;
- 如果损坏page在主键上,可使用下面两种方案恢复。
第1种:单实例架构逻辑导出导入
1、先停业务
2、使用mysqldump导出该表
3、备份原表
4、覆盖导入
5、如果逻辑恢复失败时,就只能用备份文件或者innodb_force_recovery恢复了。
第2种:在主从高可用架构了,可以先做主库切换,然后在从库修复
1、主库停业务
2、保证主库GTID不在变动
3、确保从库GTID和主库一致,记录GTID信息,然后关闭主从stop slave
4、主库导出单表并且带GTID,这个GTID和第三步的GTID对比相同
5、从库reset master 清理GTID
6、从库导入数据,导入数据自动设置GTID,导入完成后查看GTID,确认GTID和导出文件以及停止从库前的GTID一致,这样保证数据完整。
7、从库重做主从。




