暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

详述数据库坏盘,储蓄掉电,无法打开恢复案例

数据和云 2018-12-20
231


李真旭(Roger)

ACOUG 核心专家,Oracle ACE,云和恩墨技术专家



某网友的数据库由于坏盘了,并且存储掉电,导致数据库无法打开了。单看其数据库alert log的错误来看,是非常之简单的,如下:

Fri Oct 26 10:33:53 2018

Recovery of Online Redo Log: Thread 1 Group 3 Seq 39 Reading mem 0

Mem# 0: fs/fs/oradata/orcl/redo03.log

Block recovery stopped at EOT rba 39.77.16

Block recovery completed at rba 39.77.16, scn 0.1002048587

ORACLE Instance orcl (pid = 8) - Error 600 encountered while recovering transaction (9, 30) on object 9149.

Fri Oct 26 10:33:53 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_smon_192644.trc:

ORA-00600: internal error code, arguments: [6856], [0], [43], [], [], [], [], []

Fri Oct 26 10:33:56 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_smon_192644.trc:

ORA-00600: internal error code, arguments: [4194], [33], [36], [], [], [], [], []

Doing block recovery for file 2 block 713

Block recovery from logseq 39, block 82 to scn 1002048595


对于这种错误,很明显,屏蔽回滚段即可,屏蔽之后可顺利打开数据库,不过后面很快又会crash掉,因此重建undo也就绕过这个问题了。


打开数据库之后,再去观察数据库,会发现alert log有不少的错误,如下所示:

Fri Oct 26 11:01:46 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc:

ORA-00600: internal error code, arguments: [17147], [0x110549070], [], [], [], [], [], []

Fri Oct 26 11:01:46 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_m001_373218.trc:

ORA-00600: internal error code, arguments: [kdddgb5], [196650], [0], [], [], [], [], []

ORA-600 encountered when generating server alert SMG-4120

Fri Oct 26 11:01:47 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc:

ORA-00600: internal error code, arguments: [KGHALO4], [0x11047F6F0], [], [], [], [], [], []

ORA-600 encountered when generating server alert SMG-4121

Fri Oct 26 11:01:48 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc:

ORA-00600: internal error code, arguments: [KGHALO4], [0x11047F6F0], [], [], [], [], [], []

ORA-600 encountered when generating server alert SMG-4121

Fri Oct 26 11:01:50 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_m001_373218.trc:

ORA-00600: internal error code, arguments: [kdddgb5], [196650], [0], [], [], [], [], []

Fri Oct 26 11:02:22 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc:

ORA-00600: internal error code, arguments: [17114], [0x110549070], [], [], [], [], [], []

Fri Oct 26 11:02:23 2018

Errors in file fs/fs/oradata/admin/orcl/bdump/orcl_mmon_385148.trc:

ORA-00600: internal error code, arguments: [kebm_mmon_main_1], [39], [], [], [], [], [], []

ORA-00039: error during periodic action

ORA-00600: internal error code, arguments: [17114], [0x110549070], [], [], [], [], [], []

Fri Oct 26 11:03:30 2018

Restarting dead background process MMON


除此之外,由于之外alert log有坏块报错,因此对system进行了dbv检查,发现确实存在少量坏块,如下:


DBVERIFY: Release 10.2.0.4.0 - Production on Fri Oct 26 10:37:20 2018


Copyright (c) 1982, 2007, Oracle. All rights reserved.


DBVERIFY - Verification starting : FILE = system01.dbf


DBV-00200: Block, DBA 4255202, already marked corrupt

Block Checking: DBA = 4258751, Block Type = KTB-managed data block

data header at 0x11022a05c

kdbchk: fsbo(596) wrong, (hsz 4178)

Page 64447 failed with check code 6129

Block Checking: DBA = 4259386, Block Type = KTB-managed data block

**** kdxcofbo = 208 != 24

---- end index block validation

Page 65082 failed with check code 6401

Block Checking: DBA = 4269609, Block Type = Unlimited data segment header

Incorrect extent count in the extent map: 16777317

Block Checking: DBA = 4269612, Block Type = KTB-managed data block

**** kdxcofbo = 224 != 216

---- end index block validation

Page 75308 failed with check code 6401

Block Checking: DBA = 4269615, Block Type = KTB-managed data block

**** actual rows locked by itl 2 = 1 != # in trans. header = 0

---- end index block validation

Page 75311 failed with check code 6401

Page 85271 is influx - most likely media corrupt

Corrupt block relative dba: 0x00414d17 (file 1, block 85271)

Fractured block found during dbv:

Data in bad block:

type: 6 format: 2 rdba: 0x00414d17

last change scn: 0x0000.3afaf495 seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0xfe830601

check value in block header: 0x96c6

computed block checksum: 0x3c6b


Page 85383 is influx - most likely media corrupt

Corrupt block relative dba: 0x00414d87 (file 1, block 85383)

Fractured block found during dbv:

Data in bad block:

type: 6 format: 2 rdba: 0x00414d87

last change scn: 0x0000.3b6b9d19 seq: 0x1 flg: 0x06

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x970f0601

check value in block header: 0xe825

computed block checksum: 0x3c6b


DBVERIFY - Verification complete


Total Pages Examined : 640000

Total Pages Processed (Data) : 116312

Total Pages Failing (Data) : 1

Total Pages Processed (Index): 65914

Total Pages Failing (Index): 3

Total Pages Processed (Other): 64634

Total Pages Processed (Seg) : 0

Total Pages Failing (Seg) : 0

Total Pages Empty : 393138

Total Pages Marked Corrupt : 3

Total Pages Influx : 2

Highest block SCN : 1002028510 (0.1002028510)


这部分错误,其实处理起来也不困难,部分是业务表的index,但是其他的几乎都是AWR相关基表,有2个坏块跟是system相关的基表和索引,分别是I_H_OBJ#_COL#和COM$ ,HISTGRM$。


对于业务索引,很简单,直接drop 重建即可,对于这个sys的index,可以通过设置38003 event进行drop重建。


对于基表COM$,HISTGRM$,由于是非bootstrap$核心对象,其实也可以处理掉的。

不过考虑到这种毕竟是存储掉电,undo异常的情况,还是重建库更稳妥一些。最后补充一点,这个库稍微有点奇葩的地方是全库1.2TB,其中有个表的LOB自动980GB,重建数据库是相对较慢的。对于大表,且有LOB自动,通常建议基于分片,否则会报ORA-01555错误的,如下是常用的一个基于rowid的分片脚本,供大家参考:


set verify off


undefine rowid_ranges


undefine segment_name


undefine owner


set head off


set pages 0


set trimspool on


select 'where rowid between ''' ||


       sys.dbms_rowid.rowid_create(1, d.oid, c.fid1, c.bid1, 0) ||


       ''' and ''' ||


       sys.dbms_rowid.rowid_create(1, d.oid, c.fid2, c.bid2, 9999) || '''' || ';'


  from (select distinct b.rn,


                        first_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid1,


                        last_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid2,


                        first_value(decode(sign(range2 - range1),


                                           1,


                                           a.bid +


                                           ((b.rn - a.range1) * a.chunks1),


                                           a.bid)) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid1,


                        last_value(decode(sign(range2 - range1),


                                          1,


                                          a.bid +


                                          ((b.rn - a.range1 + 1) * a.chunks1) - 1,


                                          (a.bid + a.blocks - 1))) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid2


          from (select fid,


                       bid,


                       blocks,


                       chunks1,


                       trunc((sum2 - blocks + 1 - 0.1) chunks1) range1,


                       trunc((sum2 - 0.1) chunks1) range2


                  from (select *+ rule */


                         relative_fno fid,


                         block_id bid,


                         blocks,


                         sum(blocks) over() sum1,


                         trunc((sum(blocks) over()) &&rowid_ranges) chunks1,


                         sum(blocks) over(order by relative_fno, block_id) sum2


                          from dba_extents


                         where segment_name = upper('&&segment_name')


                           and owner = upper('&&owner'))


                 where sum1 > &&rowid_ranges) a,


               (select rownum - 1 rn


                  from dual


                connect by level <= &&rowid_ranges) b


         where b.rn between a.range1 and a.range2) c,


       (select max(data_object_id) oid


          from dba_objects


         where object_name = upper('&&segment_name')


           and owner = upper('&&owner')


           and data_object_id is not null) d


/



资源下载

关注公众号:数据和云(OraNews)回复关键字获取

2018DTCC , 数据库大会PPT

2018DTC,2018 DTC 大会 PPT

DBALIFE ,“DBA 的一天”海报

DBA04 ,DBA 手记4 电子书

122ARCH ,Oracle 12.2体系结构图

2018OOW ,Oracle OpenWorld 资料

PRELECTION ,大讲堂讲师课程资料

近期文章

企业数据架构的云化智能重构和变革(含大会PPT)

Oracle研发总裁Thomas Kurian加盟Google Cloud

变与不变: Undo构造一致性读的例外情况

Oracle 18c新特性:动态 Container Map 增强 Application Container 灵活性

Oracle 18c新特性:Schema-Only 帐号提升应用管理安全性

Oracle 18c新特性:多租户舰队 CDB Fleet (含PPT)

为什么看了那么多灾难,还是过不好备份这一关?

文章转载自数据和云,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论