Oracle数据库恢复: 存储及系统故障导致文件丢失

原创盖国强 2019-05-08

637

问题描述

上周帮助一个用户恢复了一个数据库，情况并不复杂，但是过程值得记录一下。

首先是存储级别的故障，导致了数据文件损坏，甚至丢失。数据库首先在告警日志中抛出如下异常：

Wed Nov 1 17:12:45 2010
Errors in file /ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005408990], [1388], [4005425099], [1484804288], [], []
Wed Nov 1 17:12:47 2010
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed Nov 1 17:12:47 2010
Errors in file /ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005408993], [1388], [4005425099], [1484804288], [], []
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
ORA-00474: SMON process terminated with error

专家解答

注意，ORA-600 2662错误本身没什么特殊，也并不算棘手，但是注意其错误发生在写出Stats数据和Extent Coalescing之时，尤其是后者，很多人一度认为10g中已经不存在该行为了。

这些错误导致数据异常Crash崩溃，而重启之后悲惨的事情出现了：

Errors in file /ADMIN/bdump/erp_q000_6024.trc:
ORA-01578: ORACLE data block corrupted (file # 11, block # 62612)
ORA-01110: data file 11: '/data/sysaux01.dbf'

SYSAUX表空间损坏了，而且用dbv检查发现，这个文件只剩下5个Block，还有4个损坏：

Page 2 is marked corrupt
Corrupt block relative dba: 0x5ec00002 (file 379, block 2)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x4e802e1b
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xc79b
 computed block checksum: 0x0

Page 3 is marked corrupt
Corrupt block relative dba: 0x5ec00003 (file 379, block 3)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x4e802e1c
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xc79c
 computed block checksum: 0x0

Page 4 is marked corrupt
Corrupt block relative dba: 0x5ec00004 (file 379, block 4)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x4e802e1d
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xc79d
 computed block checksum: 0x0

Page 5 is marked corrupt
Corrupt block relative dba: 0x5ec00005 (file 379, block 5)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x4e802e1e
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x00000001
 check value in block header: 0xc79e
 computed block checksum: 0x0



DBVERIFY - Verification complete

Total Pages Examined         : 5
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 4
Total Pages Influx           : 0
Highest block SCN            : 0 (0.0)
最后检查发现，几乎所有文件都丢失了。UNDO文件也已经被清空：
DBVERIFY - Verification starting : FILE = undo01.dbf
Page 2 is marked corrupt
Corrupt block relative dba: 0x02c00002 (file 11, block 2)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 2 rdba: 0x5ec2b80c
 last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x17e10235
 check value in block header: 0xf125
 computed block checksum: 0x0

Page 3 is marked corrupt
Corrupt block relative dba: 0x02c00003 (file 11, block 3)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 2 rdba: 0x5ec2b80d
 last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x17e10235
 check value in block header: 0xf50b
 computed block checksum: 0x0

Page 4 is marked corrupt
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 2 rdba: 0x5ec2b80e
 last change scn: 0x056c.dc9017e1 seq: 0x36 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x17e10236
 check value in block header: 0xe0ef
 computed block checksum: 0x0

Page 5 is marked corrupt
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 2 rdba: 0x5ec2b80f
 last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x17e10235
 check value in block header: 0x1844
 computed block checksum: 0x0



DBVERIFY - Verification complete

Total Pages Examined         : 5
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 4
Total Pages Influx           : 0
Highest block SCN            : 0 (0.0)

这意味着，在这样一次异常之后，所有数据文件都从存储上丢失了，多么疯狂！

接下来当我们从磁带上进行恢复时，在经历了数小时的等待之后，磁带报错，文件不能读取。
我一直不太相信磁带，这一次，磁带再次带来了大麻烦。

对于数据库不太大的用户，我强烈建议用户在主机上多配备几块硬盘，将备份存放到本地，一是获得性能，二可以加快恢复，保证恢复时间。

最后客户在一块移动硬盘上找到了一份临时分离出去的备份文件，最终靠这个偶然留存的备份挽救了数据库。

数据备份，再多一份也不为过！

oracle

最后修改时间：2019-05-08 11:04:47

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Oracle数据库恢复: 存储及系统故障导致文件丢失

问题描述

专家解答

评论