问题描述
上周帮助一个用户恢复了一个数据库,情况并不复杂,但是过程值得记录一下。
首先是存储级别的故障,导致了数据文件损坏,甚至丢失。数据库首先在告警日志中抛出如下异常:
Wed Nov 1 17:12:45 2010 Errors in file /ADMIN/bdump/erp_smon_14821.trc: ORA-00600: internal error code, arguments: [2662], [1388], [4005408990], [1388], [4005425099], [1484804288], [], [] Wed Nov 1 17:12:47 2010 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 1 out of maximum 100 non-fatal internal errors. Wed Nov 1 17:12:47 2010 Errors in file /ADMIN/bdump/erp_smon_14821.trc: ORA-00600: internal error code, arguments: [2662], [1388], [4005408993], [1388], [4005425099], [1484804288], [], [] Non-fatal internal error happenned while SMON was doing extent coalescing. SMON encountered 2 out of maximum 100 non-fatal internal errors. ORA-00474: SMON process terminated with error
专家解答
注意,ORA-600 2662错误本身没什么特殊,也并不算棘手,但是注意其错误发生在写出Stats数据和Extent Coalescing之时,尤其是后者,很多人一度认为10g中已经不存在该行为了。
这些错误导致数据异常Crash崩溃,而重启之后悲惨的事情出现了:
Errors in file /ADMIN/bdump/erp_q000_6024.trc: ORA-01578: ORACLE data block corrupted (file # 11, block # 62612) ORA-01110: data file 11: '/data/sysaux01.dbf'
SYSAUX表空间损坏了,而且用dbv检查发现,这个文件只剩下5个Block,还有4个损坏:
Page 2 is marked corrupt Corrupt block relative dba: 0x5ec00002 (file 379, block 2) Bad header found during dbv: Data in bad block: type: 0 format: 2 rdba: 0x4e802e1b last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000001 check value in block header: 0xc79b computed block checksum: 0x0 Page 3 is marked corrupt Corrupt block relative dba: 0x5ec00003 (file 379, block 3) Bad header found during dbv: Data in bad block: type: 0 format: 2 rdba: 0x4e802e1c last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000001 check value in block header: 0xc79c computed block checksum: 0x0 Page 4 is marked corrupt Corrupt block relative dba: 0x5ec00004 (file 379, block 4) Bad header found during dbv: Data in bad block: type: 0 format: 2 rdba: 0x4e802e1d last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000001 check value in block header: 0xc79d computed block checksum: 0x0 Page 5 is marked corrupt Corrupt block relative dba: 0x5ec00005 (file 379, block 5) Bad header found during dbv: Data in bad block: type: 0 format: 2 rdba: 0x4e802e1e last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x00000001 check value in block header: 0xc79e computed block checksum: 0x0 DBVERIFY - Verification complete Total Pages Examined : 5 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 1 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 4 Total Pages Influx : 0 Highest block SCN : 0 (0.0) 最后检查发现,几乎所有文件都丢失了。UNDO文件也已经被清空: DBVERIFY - Verification starting : FILE = undo01.dbf Page 2 is marked corrupt Corrupt block relative dba: 0x02c00002 (file 11, block 2) Bad header found during dbv: Data in bad block: type: 2 format: 2 rdba: 0x5ec2b80c last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x17e10235 check value in block header: 0xf125 computed block checksum: 0x0 Page 3 is marked corrupt Corrupt block relative dba: 0x02c00003 (file 11, block 3) Bad header found during dbv: Data in bad block: type: 2 format: 2 rdba: 0x5ec2b80d last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x17e10235 check value in block header: 0xf50b computed block checksum: 0x0 Page 4 is marked corrupt Corrupt block relative dba: 0x02c00004 (file 11, block 4) Bad header found during dbv: Data in bad block: type: 2 format: 2 rdba: 0x5ec2b80e last change scn: 0x056c.dc9017e1 seq: 0x36 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x17e10236 check value in block header: 0xe0ef computed block checksum: 0x0 Page 5 is marked corrupt Corrupt block relative dba: 0x02c00005 (file 11, block 5) Bad header found during dbv: Data in bad block: type: 2 format: 2 rdba: 0x5ec2b80f last change scn: 0x056c.dc9017e1 seq: 0x35 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x17e10235 check value in block header: 0x1844 computed block checksum: 0x0 DBVERIFY - Verification complete Total Pages Examined : 5 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 1 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 4 Total Pages Influx : 0 Highest block SCN : 0 (0.0)
这意味着,在这样一次异常之后,所有数据文件都从存储上丢失了,多么疯狂!
接下来当我们从磁带上进行恢复时,在经历了数小时的等待之后,磁带报错,文件不能读取。
我一直不太相信磁带,这一次,磁带再次带来了大麻烦。
对于数据库不太大的用户,我强烈建议用户在主机上多配备几块硬盘,将备份存放到本地,一是获得性能,二可以加快恢复,保证恢复时间。
最后客户在一块移动硬盘上找到了一份临时分离出去的备份文件,最终靠这个偶然留存的备份挽救了数据库。
数据备份,再多一份也不为过!
最后修改时间:2019-05-08 11:04:47
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。