暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

The segment header block corrupted cause ORA-08103 issue (段头坏块导致ora-8103)

张维照 2019-05-31
1422

问题描述

节前有个网友向我咨询他们的生产库遇到了ora-8103, 环境是Single instance 11.2.0.1 on linux, 今天整理一下问题的过程。

问题是在对schema exp备份时发现的提示有几张表exp-00056 ,ora-08103 错误并跳过。


专家解答

先看一下MOS NOTE 造成的原因

ORA-8103 is caused by an invalid block type. The block header has an invalid block type or the block type inside the block is not expected; e.g. a data block (Type=6) was expected but the actual block information is not a data block (Type!=6).
ORA-8103 is also caused by an unexpected data_object_id where it is changing for the involved objects while the affected SQL statement is executed.
These two causes might be due to an expected behavior or other problems. Details are:
Tables are being dropped/truncated while a SQL statement for those tables is still in execution.
Hardware, IO subsystem or OS problems may cause block corruptions overwriting the Block Type in the block header causing the error ORA-8103.
Overlapped Extents means that two or more segments incorrectly use the same block. This is a LMT Bitmap Extent inconsistency or Data Dictionary inconsistency.
If using ASM: overlapped ASM extents is when 2 different files and 2 different extents use the same Allocation Unit (AU) in the same ASM disk.
Oracle bugs..

本案例的解决思路:

at first Flush the buffer cache.only buffer cache corrupt exclude.
run dbv to verify datafile which the affected object in. not found any corrupted block.
analyze table

validate structure; –>ora-8103 move table to other tablespace –>ora-8103 CTAS –>ora-8103 dbms_space_admin.assm_segment_verify –>ora-8103

然后远程查看了表上存在id 主键,尝试用MOS Note 提供的 script 利用rowid 遇到ora-8103 的情况时跳过该记录的方法取出正常数据来拯救数据, 这个方法是优点是可以一次跳过多条触发8103的记录。 开始临时表创建在同一schema中居然触发ora-600 ktssinseg6,

create table anbob.tt_new as select * from anbob.tt where 1=2;

ORA-00600: 内部错误代码, 参数: [ktssinseg3], [6], [20984642]

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   7FFF5DF2B088 ? 000000000 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+2736  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFF5DF26B88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexProcessError()  call     dbgexPhaseII()       7FFA664C56F0 ? 7FFA662AF598 ?
+2279                                              7FFF5DF32E78 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  7FFA664C56F0 ? 7FFA662AF598 ?
()+83                                              000000001 ? 000000000 ?
                                                   7FFF00000000 ? 000000000 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  7FFA664C56F0 ? 7FFA662AF598 ?
1615                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7FFA662B7AE8 ?
63                                                 000000258 ? 7FFA662AF598 ?
                                                   000000000 ? 000000000 ?
kgeadse()+383        call     dbkePostKGE_kgsf()   00A99D540 ? 7FFA662B7AE8 ?
                                                   000000258 ? 7FFA662AF598 ?
                                                   000000000 ? 000000000 ?
kgerinv_internal()+  call     kgeadse()            00A99D540 ? 7FFA662B7AE8 ?

TIP:
为了尝试解决这个ora-600,然后修改了脚本创建表到system schema中(表不大,字典管理表空间) 结果还是ora-8103, 难道不是data block, 难道是segment header block? 下面可以使用event 10236 或10200 和8103 errorstack event 尝试定位block.

sqlplus sys/oracle as sysdba

alter session set max_dump_file_size=unlimited;
alter session set db_file_multiblock_read_count=1;alter session set events 'immediate trace name trace_buffer_on level 1048576';alter session set events '10200 trace name context forever, level 1';
alter session set events '8103 trace name errorstack level 3';
alter session set events '10236 trace name context forever, level 1';
alter session set tracefile_identifier='ORA8103'; 
select /*+no_index(t)*/ count(*) from anbob.xx  t;
oradebug setmypid;
oradebug tracefile_name;

Tip:
上面是MOS中提供的脚本,但是在11g中未发现10200的trace信息,老熊有写过10g好像是要去掉trace_buffer_on, 但是通过操作数据库不方便,从当前的10236 trace中就可以发现一些信息了。

kcbzib: dump suspect buffer
buffer tsn: 6 rdba: 0x0140351a (5/13594)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
...
BH (0x1bbfc0e98) file#: 5 rdba: 0x0140351a (5/13594) class: 4 ba: 0x1bb8f2000
  set: 17 pool 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 46,19
  dbwrid: 0 obj: 134603 objn: 134603 tsn: 6 afn: 5 hint: f
  hash: [0x2119af850,0x2119af850] lru: [0x1bbfc10b0,0x21d893fa0]
  ckptq: [NULL] fileq: [NULL] objq: [0x20853aa80,0x20853aa80]
  use: [0x212b17f00,0x212b17f00] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  cr pin refcnt: 0 sh pin refcnt: 0
  Using State Objects
  ...

  The buffer with tsn: 0 rdba: 0x00400321 has already been dumped
 The buffer with tsn: 6 rdba: 0x0140351a was pinned,
but could not be dumped

Note:
提示可疑的数据块是tsn: 6 rdba: 0x0140351a (5/13594),block class 是4 应该是Segment header,但是 block type 是0 unknown,
从dba_segments 视图也可以确认file 5 ,13594 是该对象(134603) 的header block. 下面来dump 这个block.

alter system dump datafile 5 block 13594;

— trace file—

Start dump data blocks tsn: 6 file#:5 minblk 13594 maxblk 13594

*** 2014-09-28 18:58:08.055
Block dump from cache:
Dump of buffer cache at level 4 for tsn=6, rdba=20985114
Block dump from disk:
buffer tsn: 6 rdba: 0x0000351a (0/13594)
scn: 0x0000.00000000 seq: 0x01 flg: 0x05 tail: 0x00000001
frmt: 0x02 chkval: 0x921a type: 0x00=unknown
Hex dump of corrupt header 4 = CORRUPT
Dump of memory from 0x00007F58133AE200 to 0x00007F58133AE214
7F58133AE200 0000A200 0000351A 00000000 05010000  [.....5..........]
7F58133AE210 0000921A                             [....]            
Hex dump of block: st=4, typ_found=0
Dump of memory from 0x00007F58133AE200 to 0x00007F58133B0200
7F58133AE200 0000A200 0000351A 00000000 05010000  [.....5..........]
7F58133AE210 0000921A 00000000 00000000 00000000  [................]
7F58133AE220 00000000 00000000 00000000 00000000  [................]
        Repeat 508 times
7F58133B01F0 00000000 00000000 00000000 00000001  [................]
End dump data blocks tsn: 6 file#: 5 minblk 13594 maxblk 13594

Note:
当然到这里可以确认错误的block 是 SEGMENT HEADER BLOCK, 不可以简单通过rowid 跳过, 也不能像修改data block corruption 那样通过bbed 修改块为 corrupted 然后通过 dbms_repair.skip_corrupt_blocks 跳过该块, 该segment header block上都没有extent 信息(如extent map\HWM), 如果重构起来工作量还是不小的, 当前最合理的办法是可以借助像 ORACLE DUL 这样的工具,scan extent , 然后 unload table xx object xxx, 直接从数据文件抽取数据生成txt 或dump 文件 ,导入成功后,删除原对象(corrupted segment)。

附上8103 的error stack

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFF96B270C8 ? 000000001 ?
                                                   7FFF96B2B5C8 ? 000000000 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+2736  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000000 ?
                                                   7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkdaKsdActDriver()  call     ksedmp()             000000003 ? 000000000 ?
+1947                                              7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgdaExecuteAction(  call     dbkdaKsdActDriver()  7F318B86D6F0 ? 7FFF96B2DC70 ?
)+1020                                             7FFF96B270C8 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgdaRunAction()+81  call     dbgdaExecuteAction(  7F318B86D6F0 ? 009138928 ?
0                             )                    0020C0003 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbgdRunActions()+59  call     dbgdaRunAction()     7F318B86D6F0 ? 000000005 ?
                                                   0020C0003 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbgdProcessEventAct  call     dbgdRunActions()     7F318B86D6F0 ? 000000005 ?
ions()+615                                         0020C0003 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbgdChkEventKgErr()  call     dbgdProcessEventAct  7F318B86D6F0 ? 00A99D540 ?
+1749                         ions()               7F318B397430 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbkdChkEventRdbmsEr  call     dbgdChkEventKgErr()  7F318B86D6F0 ? 00A99D540 ?
r()+20                                             000001FA7 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
__PGOSF32_ksfpec()+  call     dbkdChkEventRdbmsEr  000001FA7 ? 00A99D540 ?
61                            r()                  000001FA7 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbgePostErrorKGE()+  call     __PGOSF32_ksfpec()   000001FA7 ? 000001FA7 ?
1004                                               000001FA7 ? 7FFF96B2DC70 ?
                                                   000000001 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000001 ? 7F318B8725B0 ?
63                                                 000001FA7 ? 000000000 ?
                                                   000000001 ? 000000000 ?
kgesev()+273         call     dbkePostKGE_kgsf()   00A99D540 ? 7F318B8725B0 ?
                                                   000001FA7 ? 000000000 ?
                                                   000000001 ? 000000000 ?
ksesec0()+162        call     kgesev()             00A99D540 ? 7F318B8725B0 ?
                                                   000001FA7 ? 000000000 ?
                                                   7FFF96B2E9B0 ? 000000000 ?
kcbzib()+4267        call     ksesec0()            00A99D540 ? 00A99D6F0 ?
                                                   000000000 ? 00A99D6F0 ?
                                                   00000000A ?
                                                   455220504D554F44 ?
kcbgtcr()+16928      call     kcbzib()             00000FFFF ? 7FFF96B2F6D0 ?
                                                   000001FA7 ? 000000000 ?
                                                   000000003 ? 7FFF00000284 ?
ktecgsc()+1123       call     kcbgtcr()            7FFF96B2F6D0 ? 000000000 ?
                                                   000000284 ? 000000000 ?
                                                   000000000 ? 7FFF00000284 ?
ktecgetsh()+157      call     ktecgsc()            7FFF96B2F6D0 ? 000000003 ?


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论