暂无图片
暂无图片
8
暂无图片
暂无图片
暂无图片

ORA-01578 坏块处理

原创 yuqi.zhou 2022-05-31
2572

问题现象

客户在执行一条查询语句的时候,数据库报出ORA-01578错误
报错截图如下:
图片.png

问题初步分析

通过错误可以看出,很明显是因为SQL访问到的数据块(file#=7,block#=113906)损坏导致报错。
客户反馈出问题的是一个开发库,因该库多次处理过该库的问题,我对这个库的大概信息有所了解,该库没有备份,数据也不是很重要。
修复损坏的数据块,最好的方式是通过备份集,执行blockrecover来修复,数据不会丢失。或者dump该块的结构,通过bbed进行修复(技能要求较高)。
既然数据不是很重要,如果损坏的就这一个块,那么最快修复的方式是跳过损坏的数据块,然后重建表,一个数据块的数据损失对一个开发数据库来讲,几乎可以忽略不计。
与客户简单沟通后,同意该方案后开干。

问题处理

(1)检查数据文件

通过backup validate检查数据文件,看是否还有其它block损坏,也可以通过dbv来检查。

RMAN> backup validate check logical datafile 7;

Starting backup at 2022-04-14 16:33:38
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1119 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:07:25
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     1              52893        4194302         18110743018595
  File Name: /u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1399377         
  Index      0              2569304         
  Other      0              172728          

Finished backup at 2022-04-14 16:41:04
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;

    FILE#     BLOCK#	  BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE
---------- ---------- ---------- ------------------ ---------------------------
    7     113906	       1	 1.8111E+13 CORRUPT

可以看到该文件只有一个数据块损坏。

(2)确认对象

select OWNER,
       SEGMENT_NAME,
       PARTITION_NAME,
       SEGMENT_TYPE,
       TABLESPACE_NAME,
       FILE_ID,
       BLOCK_ID,
       BLOCKS,
       RELATIVE_FNO
  from dba_extents
 where file_id = 7
   AND 113906 BETWEEN block_id AND block_id + blocks - 1;

OWNER		SEGMENT_NAME	     PARTITION_NAME	  SEGMENT_TYPE	       TABLESPACE_NAME	     FILE_ID   BLOCK_ID     BLOCKS RELATIVE_FNO
--------------- -------------------- -------------------- -------------------- ------------------ ---------- ---------- ---------- ------------
PROJECTLIB	GLF_BAL 				  TABLE 	       GFMIS_SYS_DATA		   7	 113904 	 8	      7

对象名与报错SQL语句中要查的对象一致。

(3)跳过坏块

这一步,我们可以参考MOS中提供的几种方法,任选一种来处理:
ORA-1578 Methods to Skip Block Corruption (Doc ID 2199133.1)
我这里采用DBMS_REPAIR.SKIP_CORRUPT_BLOCKS程序来进行跳过。

SQL> col table_name for a15
SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='GLF_BAL';

OWNER		TABLE_NAME	SKIP_CORRUPT
--------------- --------------- ------------------------
PROJECTLIB	GLF_BAL 	DISABLED

SQL> BEGIN
  DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
  (
    SCHEMA_NAME   => 'PROJECTLIB',
    OBJECT_NAME   => 'GLF_BAL',
    OBJECT_TYPE   => dbms_repair.table_object,
    FLAGS         => dbms_repair.SKIP_FLAG
  );
END;
/  2    3    4    5    6    7    8    9   10  

PL/SQL procedure successfully completed.

执行后可以通过dba_tables里的SKIP_CORRUPT字段来验证。

SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='GLF_BAL';

OWNER		TABLE_NAME	SKIP_CORRUPT
--------------- --------------- ------------------------
PROJECTLIB	GLF_BAL 	ENABLED

官方文档中关于SKIP_CORRUPT字段的解释如下,一看就明白。
Indicates whether Oracle Database ignores blocks marked corrupt during table and index scans (ENABLED) or raises an error (DISABLED). To enable this feature, run the DBMS_REPAIR.SKIP_CORRUPT_BLOCKS procedure.

4. 验证

标记坏块后,查询之前报错SQL句正常了。
但是观察alert log,发现还是在报错。

Thu Apr 14 17:17:08 2022
Errors in file /u01/app/oracle/diag/rdbms/yktdb/yktdb/trace/yktdb_smon_22147.trc  (incident=819024):
ORA-01578: ORACLE data block corrupted (file # 7, block # 113906)
ORA-01110: data file 7: '/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/yktdb/yktdb/incident/incdir_819024/yktdb_smon_22147_i819024.trc
ORACLE Instance yktdb (pid = 15) - Error 1578 encountered while recovering transaction (58, 11) on object 172902.
Errors in file /u01/app/oracle/diag/rdbms/yktdb/yktdb/trace/yktdb_smon_22147.trc:
ORA-01578: ORACLE data block corrupted (file # 7, block # 113906)
ORA-01110: data file 7: '/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf'
Thu Apr 14 17:17:09 2022
Dumping diagnostic data in directory=[cdmp_20220414171709], requested by (instance=1, osid=22147 (SMON)), summary=[incident=819024].

从alert log里的报错来看,错误是smon进程抛出来的,while recovering transaction表明正在进行事务恢复。查询可以跳过,但是事务恢复不能,符合常理。

5. 继续处理

不得不说MOS实在太强大了,这个问题如下文档有完整的记录:
ORA-01578 Encountered while Recovering Transaction on Object (Doc ID 2470377.1)

(1)为了避免smon导致数据库崩溃,设置_smu_debug_mode

SQL> alter system set "_smu_debug_mode"=1024;

设置"_smu_debug_mode"=1024来停止事务恢复,该参数可以动态修改。

(2)导出表

[oracle@yth132 ~]$ exp \'\/ as sysdba\' tables=PROJECTLIB.GLF_BAL log=GLF_BAL.log

Export: Release 11.2.0.4.0 - Production on Fri Apr 15 18:03:59 2022

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in AL32UTF8 character set and AL16UTF16 NCHAR character set
server uses ZHS16GBK character set (possible charset conversion)

About to export specified tables via Conventional Path ...
Current user changed to PROJECTLIB
. . exporting table                        GLF_BAL
Export terminated successfully without warnings.

(3)删除表

SQL> drop table PROJECTLIB.GLF_BAL;

Table dropped.

(4)导入表

[oracle@yth132 ~]$ imp \'\/ as sysdba\' log=imp_GLF_BAL.log file=GLF_BAL.dmp IGNORE=Y fromuser=PROJECTLIB touser=PROJECTLIB

Import: Release 11.2.0.4.0 - Production on Fri Apr 15 18:24:37 2022

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export file created by EXPORT:V11.02.00 via conventional path
import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. importing PROJECTLIB's objects into PROJECTLIB
. . importing table                      "GLF_BAL"     273155 rows imported
Import terminated successfully without warnings.

(5)取消表上的SKIP CORRUPT标记

SQL> BEGIN
  DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
  (
    SCHEMA_NAME   => 'PROJECTLIB',
    OBJECT_NAME   => 'GLF_BAL',
    FLAGS         => dbms_repair.NOSKIP_FLAG
  );
END;  2    3    4    5    6    7    8  
  9  /

PL/SQL procedure successfully completed.

(6)重置_smu_debug_mode参数

SQL> alter system reset "_smu_debug_mode";

System altered.

完毕后再观察alert log,没有再报错。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
1人已赞赏
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论