问题描述
节前有个网友向我咨询他们的生产库遇到了ora-8103, 环境是Single instance 11.2.0.1 on linux, 今天整理一下问题的过程。
问题是在对schema exp备份时发现的提示有几张表exp-00056 ,ora-08103 错误并跳过。
专家解答
先看一下MOS NOTE 造成的原因
ORA-8103 is caused by an invalid block type. The block header has an invalid block type or the block type inside the block is not expected; e.g. a data block (Type=6) was expected but the actual block information is not a data block (Type!=6). ORA-8103 is also caused by an unexpected data_object_id where it is changing for the involved objects while the affected SQL statement is executed. These two causes might be due to an expected behavior or other problems. Details are: Tables are being dropped/truncated while a SQL statement for those tables is still in execution. Hardware, IO subsystem or OS problems may cause block corruptions overwriting the Block Type in the block header causing the error ORA-8103. Overlapped Extents means that two or more segments incorrectly use the same block. This is a LMT Bitmap Extent inconsistency or Data Dictionary inconsistency. If using ASM: overlapped ASM extents is when 2 different files and 2 different extents use the same Allocation Unit (AU) in the same ASM disk. Oracle bugs..
本案例的解决思路:
at first Flush the buffer cache.only buffer cache corrupt exclude.
run dbv to verify datafile which the affected object in. not found any corrupted block.
analyze table
validate structure; –>ora-8103 move table to other tablespace –>ora-8103 CTAS –>ora-8103 dbms_space_admin.assm_segment_verify –>ora-8103
然后远程查看了表上存在id 主键,尝试用MOS Note 提供的 script 利用rowid 遇到ora-8103 的情况时跳过该记录的方法取出正常数据来拯救数据, 这个方法是优点是可以一次跳过多条触发8103的记录。 开始临时表创建在同一schema中居然触发ora-600 ktssinseg6,
create table anbob.tt_new as select * from anbob.tt where 1=2; ORA-00600: 内部错误代码, 参数: [ktssinseg3], [6], [20984642] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ? 7FFF5DF26B88 ? 000000001 ? 7FFF5DF2B088 ? 000000000 ? ksedst1()+98 call skdstdst() 000000000 ? 000000000 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? ksedst()+34 call ksedst1() 000000000 ? 000000001 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? dbkedDefDump()+2736 call ksedst() 000000000 ? 000000001 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? ksedmp()+36 call dbkedDefDump() 000000003 ? 000000002 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? ksfdmp()+64 call ksedmp() 000000003 ? 000000002 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ? 7FFF5DF26B88 ? 000000001 ? 000000000 ? 000000000 ? dbgexProcessError() call dbgexPhaseII() 7FFA664C56F0 ? 7FFA662AF598 ? +2279 7FFF5DF32E78 ? 000000001 ? 000000000 ? 000000000 ? dbgeExecuteForError call dbgexProcessError() 7FFA664C56F0 ? 7FFA662AF598 ? ()+83 000000001 ? 000000000 ? 7FFF00000000 ? 000000000 ? dbgePostErrorKGE()+ call dbgeExecuteForError 7FFA664C56F0 ? 7FFA662AF598 ? 1615 () 000000001 ? 000000001 ? 000000000 ? 000000000 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000000000 ? 7FFA662B7AE8 ? 63 000000258 ? 7FFA662AF598 ? 000000000 ? 000000000 ? kgeadse()+383 call dbkePostKGE_kgsf() 00A99D540 ? 7FFA662B7AE8 ? 000000258 ? 7FFA662AF598 ? 000000000 ? 000000000 ? kgerinv_internal()+ call kgeadse() 00A99D540 ? 7FFA662B7AE8 ?
TIP:
为了尝试解决这个ora-600,然后修改了脚本创建表到system schema中(表不大,字典管理表空间) 结果还是ora-8103, 难道不是data block, 难道是segment header block? 下面可以使用event 10236 或10200 和8103 errorstack event 尝试定位block.
sqlplus sys/oracle as sysdba alter session set max_dump_file_size=unlimited; alter session set db_file_multiblock_read_count=1;alter session set events 'immediate trace name trace_buffer_on level 1048576';alter session set events '10200 trace name context forever, level 1'; alter session set events '8103 trace name errorstack level 3'; alter session set events '10236 trace name context forever, level 1'; alter session set tracefile_identifier='ORA8103'; select /*+no_index(t)*/ count(*) from anbob.xx t; oradebug setmypid; oradebug tracefile_name;
Tip:
上面是MOS中提供的脚本,但是在11g中未发现10200的trace信息,老熊有写过10g好像是要去掉trace_buffer_on, 但是通过操作数据库不方便,从当前的10236 trace中就可以发现一些信息了。
kcbzib: dump suspect buffer buffer tsn: 6 rdba: 0x0140351a (5/13594) scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001 frmt: 0x02 chkval: 0x0000 type: 0x00=unknown ... BH (0x1bbfc0e98) file#: 5 rdba: 0x0140351a (5/13594) class: 4 ba: 0x1bb8f2000 set: 17 pool 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 46,19 dbwrid: 0 obj: 134603 objn: 134603 tsn: 6 afn: 5 hint: f hash: [0x2119af850,0x2119af850] lru: [0x1bbfc10b0,0x21d893fa0] ckptq: [NULL] fileq: [NULL] objq: [0x20853aa80,0x20853aa80] use: [0x212b17f00,0x212b17f00] wait: [NULL] st: READING md: EXCL tch: 0 flags: only_sequential_access cr pin refcnt: 0 sh pin refcnt: 0 Using State Objects ... The buffer with tsn: 0 rdba: 0x00400321 has already been dumped The buffer with tsn: 6 rdba: 0x0140351a was pinned, but could not be dumped
Note:
提示可疑的数据块是tsn: 6 rdba: 0x0140351a (5/13594),block class 是4 应该是Segment header,但是 block type 是0 unknown,
从dba_segments 视图也可以确认file 5 ,13594 是该对象(134603) 的header block. 下面来dump 这个block.
alter system dump datafile 5 block 13594;
— trace file—
Start dump data blocks tsn: 6 file#:5 minblk 13594 maxblk 13594 *** 2014-09-28 18:58:08.055 Block dump from cache: Dump of buffer cache at level 4 for tsn=6, rdba=20985114 Block dump from disk: buffer tsn: 6 rdba: 0x0000351a (0/13594) scn: 0x0000.00000000 seq: 0x01 flg: 0x05 tail: 0x00000001 frmt: 0x02 chkval: 0x921a type: 0x00=unknown Hex dump of corrupt header 4 = CORRUPT Dump of memory from 0x00007F58133AE200 to 0x00007F58133AE214 7F58133AE200 0000A200 0000351A 00000000 05010000 [.....5..........] 7F58133AE210 0000921A [....] Hex dump of block: st=4, typ_found=0 Dump of memory from 0x00007F58133AE200 to 0x00007F58133B0200 7F58133AE200 0000A200 0000351A 00000000 05010000 [.....5..........] 7F58133AE210 0000921A 00000000 00000000 00000000 [................] 7F58133AE220 00000000 00000000 00000000 00000000 [................] Repeat 508 times 7F58133B01F0 00000000 00000000 00000000 00000001 [................] End dump data blocks tsn: 6 file#: 5 minblk 13594 maxblk 13594
Note:
当然到这里可以确认错误的block 是 SEGMENT HEADER BLOCK, 不可以简单通过rowid 跳过, 也不能像修改data block corruption 那样通过bbed 修改块为 corrupted 然后通过 dbms_repair.skip_corrupt_blocks 跳过该块, 该segment header block上都没有extent 信息(如extent map\HWM), 如果重构起来工作量还是不小的, 当前最合理的办法是可以借助像 ORACLE DUL 这样的工具,scan extent , 然后 unload table xx object xxx, 直接从数据文件抽取数据生成txt 或dump 文件 ,导入成功后,删除原对象(corrupted segment)。
附上8103 的error stack
----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ? 7FFF96B270C8 ? 000000001 ? 7FFF96B2B5C8 ? 000000000 ? ksedst1()+98 call skdstdst() 000000000 ? 000000000 ? 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? ksedst()+34 call ksedst1() 000000000 ? 000000001 ? 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? dbkedDefDump()+2736 call ksedst() 000000000 ? 000000001 ? 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? ksedmp()+36 call dbkedDefDump() 000000003 ? 000000000 ? 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? dbkdaKsdActDriver() call ksedmp() 000000003 ? 000000000 ? +1947 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? dbgdaExecuteAction( call dbkdaKsdActDriver() 7F318B86D6F0 ? 7FFF96B2DC70 ? )+1020 7FFF96B270C8 ? 000000001 ? 000000000 ? 000000000 ? dbgdaRunAction()+81 call dbgdaExecuteAction( 7F318B86D6F0 ? 009138928 ? 0 ) 0020C0003 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbgdRunActions()+59 call dbgdaRunAction() 7F318B86D6F0 ? 000000005 ? 0020C0003 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbgdProcessEventAct call dbgdRunActions() 7F318B86D6F0 ? 000000005 ? ions()+615 0020C0003 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbgdChkEventKgErr() call dbgdProcessEventAct 7F318B86D6F0 ? 00A99D540 ? +1749 ions() 7F318B397430 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbkdChkEventRdbmsEr call dbgdChkEventKgErr() 7F318B86D6F0 ? 00A99D540 ? r()+20 000001FA7 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? __PGOSF32_ksfpec()+ call dbkdChkEventRdbmsEr 000001FA7 ? 00A99D540 ? 61 r() 000001FA7 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbgePostErrorKGE()+ call __PGOSF32_ksfpec() 000001FA7 ? 000001FA7 ? 1004 000001FA7 ? 7FFF96B2DC70 ? 000000001 ? 000000000 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000000001 ? 7F318B8725B0 ? 63 000001FA7 ? 000000000 ? 000000001 ? 000000000 ? kgesev()+273 call dbkePostKGE_kgsf() 00A99D540 ? 7F318B8725B0 ? 000001FA7 ? 000000000 ? 000000001 ? 000000000 ? ksesec0()+162 call kgesev() 00A99D540 ? 7F318B8725B0 ? 000001FA7 ? 000000000 ? 7FFF96B2E9B0 ? 000000000 ? kcbzib()+4267 call ksesec0() 00A99D540 ? 00A99D6F0 ? 000000000 ? 00A99D6F0 ? 00000000A ? 455220504D554F44 ? kcbgtcr()+16928 call kcbzib() 00000FFFF ? 7FFF96B2F6D0 ? 000001FA7 ? 000000000 ? 000000003 ? 7FFF00000284 ? ktecgsc()+1123 call kcbgtcr() 7FFF96B2F6D0 ? 000000000 ? 000000284 ? 000000000 ? 000000000 ? 7FFF00000284 ? ktecgetsh()+157 call ktecgsc() 7FFF96B2F6D0 ? 000000003 ?