问题描述
客户在比较繁忙的时刻执行了索引的REBUILD操作,导致大量会话被锁,最终对ONLINE REBUILD操作执行了取消操作,引发了一系列更严重的ORA-600错误。
登录数据库后,发现两个节点上ORACLE_BASE所在目录都已经100%占用。数据库无法正常通过/ AS SYSDBA方式登录。
查询告警日志发现大量的ORA-600和ORA-7445错误:
Tue May 08 21:20:26 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1555.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:27 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_3891.trc: ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], [] Tue May 08 21:20:27 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26190.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:27 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_2873.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:27 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_4518.trc: ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], [] Tue May 08 21:20:27 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_22469.trc: ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], [] Tue May 08 21:20:28 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26440.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:29 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_762.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:29 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26106.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:30 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1597.trc: ORA-07445: 出现异常错误: 核心转储 [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] [] Tue May 08 21:20:30 EAT 2012 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_856.trc: ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
专家解答
检查对应的TRACE文件,发现导致错误语句执行的是TABLE_A表的插入:
ksedmp: internal OR fatal error ORA-00600: 内部错误代码, 参数: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], [] CURRENT SQL statement FOR this SESSION: INSERT INTO TABLE_A (O_ID, P_ID, S_ID, F_ITEM, F_NAME, T_MON, D_MON, I_MON, P_STATE, P_TIME, R1, R2) VALUES (:B10 , SUBSTR(:B10 , LENGTH(:B10 ) - 1, 2), :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , NULL, :B2 , :B1 ) ----- PL/SQL Call Stack ----- object line object handle NUMBER name c00000203c2cc550 119 package body U1.P_O_I c00000203b788200 288 PROCEDURE U1.U_B_O_I c00000203719b8d0 1 anonymous block ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst()+64 CALL ksedst1() 000000000 ? 000000001 ? ksedmp()+2176 CALL ksedst() 000000000 ? C000000000000D20 ? 4000000004037940 ? 000000000 ? 000000000 ? 000000000 ? ksfdmp()+112 CALL ksedmp() 000000003 ? 9FFFFFFFFFFF1230 ? 60000000000BA290 ? 9FFFFFFFFFFF1800 ? C000000000000999 ? 400000000407F9B0 ? kgerinv()+304 CALL ksfdmp() 9FFFFFFFFFFF1D90 ? 000000003 ? 9FFFFFFFFFFF1810 ? 60000000000BA290 ? C000000000000612 ? 40000000098C38B0 ? kgeasnmierr()+144 CALL kgerinv() 60000000000318D0 ? 4000000001AD98A0 ? 6000000000032988 ? 4000000001AD98A0 ? 9FFFFFFFFFFF1DD0 ? $cold_kghstack_unde CALL kgeasnmierr() 60000000000318D0 ? rflow_internal()+36 9FFFFFFFBF3B1168 ? 8 9FFFFFFFBF3B1178 ? 6000000000032D00 ? 000000002 ? 60000000003002F0 ? 000000001 ? 000000006 ? kghstack_free()+208 CALL $cold_kghstack_unde 60000000000318D0 ? rflow_internal() 60000000003002F0 ? 60000000000BA290 ? C000000000000B1D ? 4000000002F7A510 ? 00002C87B ? 6000000000031A10 ? ksmfrs()+48 CALL kghstack_free() 60000000000318D0 ? 60000000003002F0 ? rpiswu2()+1312 CALL ksmfrs() 60000000003002F0 ? C000000000001026 ? 4000000002F78960 ? 000000000 ? 000000000 ? 000000000 ? rpidrv()+2352 CALL rpiswu2() 9FFFFFFFFFFF2AF0 ? 4000000002F7AE60 ? 00002F833 ? 60000000000A7D20 ? 9FFFFFFFFFFF1E20 ? C000000000001ABD ? 4000000000F14558 ? 60000000000C6CF0 ? psddr0()+864 CALL rpidrv() 000000018 ? 000000066 ? 9FFFFFFFFFFF3700 ? 000000038 ? 9FFFFFFFFFFF2B20 ? 60000000000BA290 ? psdnal()+736 CALL psddr0() 000000018 ? 000000066 ? 9FFFFFFFFFFF3700 ? 000000030 ? pevm_EXECC()+832 CALL psdnal() 9FFFFFFFFFFF54D0 ? C00000203489A9F8 ? C0000000000011AA ? 9FFFFFFFBE832840 ? C00000203C2CC550 ? 400000000313C770 ? 000026035 ? pfrinstr_EXECC()+16 CALL pevm_EXECC() 9FFFFFFFBE83D1D0 ? 0 9FFFFFFFBE8328B0 ? 000000020 ? pfrrun_no_tool()+19 CALL pfrinstr_EXECC() 9FFFFFFFBE83D1D0 ? 2 C000001DA198D61C ? 9FFFFFFFBE83D238 ? pfrrun()+1376 CALL pfrrun_no_tool() 9FFFFFFFBE83D1D0 ? 000002001 ? 9FFFFFFFBE83D238 ? 60000000000BA290 ? C000000000000A1C ? 4000000003163040 ? 9FFFFFFFBE83D620 ? 9FFFFFFFBE83D298 ? plsql_run()+1328 CALL pfrrun() 9FFFFFFFFFFF3788 ? 9FFFFFFFFFFF3770 ? 60000000000BA290 ? 9FFFFFFFFFFF4370 ? 9FFFFFFFFFFF4370 ? C000000000000E23 ? 4000000002C34D70 ? peicnt()+560 CALL plsql_run() 9FFFFFFFFFFF4380 ? 000000001 ? 000000000 ? 9FFFFFFFFFFF4380 ? 60000000000BA290 ? 9FFFFFFFFFFF4E90 ? kkxexe()+1008 CALL peicnt() 9FFFFFFFFFFF54D0 ? 9FFFFFFFBE83D1D0 ? 9FFFFFFFFFFF4EA0 ? 60000000000BA290 ? 9FFFFFFFFFFF5420 ? C000000000000A1C ? 000000020 ? 9FFFFFFFFFFF4F00 ? opiexe()+8016 CALL kkxexe() 0000051F0 ?
这个SQL语句是正常的业务操作,而这种简单的INSERT都会导致ORA-600和ORA-7445错误,说明插入的表或索引本身存在问题。
检查表和索引的状态,发现索引部分分区状态异常:
CREATE INDEX "U1"."IDX_F_2" ON "U1"."TABLE_A" ("S_ID") PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE( BUFFER_POOL DEFAULT) LOCAL (PARTITION "P1" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS7" , PARTITION "P2" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS8" , PARTITION "P3" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS9" , PARTITION "P4" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS10" , PARTITION "P5" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS1" , PARTITION "P6" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS2" , PARTITION "P7" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS3" , PARTITION "P8" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS4" , PARTITION "P9" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS5" , PARTITION "P10" PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "TBS6" ) ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P6" UNUSABLE ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P7" UNUSABLE
和客户沟通后,确认今天问题发生之前有人对索引进行维护操作,操作语句包括:
ALTER INDEX u1.IDX_F_2 rebuild partition p6 parallel ALTER INDEX U1.IDX_T_2 rebuild partition P9 nologging parallel online ALTER INDEX U1.IDX_T_1 rebuild partition P3 nologging parallel online ALTER INDEX U1.IDX_F_2 rebuild partition P1 nologging parallel online ALTER INDEX U1.IDX_T_4 rebuild partition P1 nologging parallel online ALTER INDEX U1.IDX_F_4 rebuild partition P5 nologging parallel online
在对索引进行维护之后,数据库中出现了大量的enq: TM – contention等待,TM锁出现的原和是索引IDX_F_2的PARTITION P6重建语句没有添加ONLINE参数有直接关系。随后大量会话被阻塞,而整个索引重建操作被人为中止,其中包括正在运行的ONLINE REBUILD操作,而ONLINE REBUILD操作被中止是十分危险的,很容易导致bug的产生,而当前就导致了ORA-600和ORA-7445的产生。
由于客户着急解决问题,因此对数据库进行了重启。重启后,ORA-600错误信息不再出现。但是前台应用报错,索引失效。
SELECT 'ALTER INDEX U1.' || INDEX_NAME || ' REBUILD PARTITION ' || PARTITION_NAME || ';' FROM DBA_IND_PARTITIONS WHERE INDEX_OWNER = 'U1' AND STATUS != 'USABLE';
获取所有失效的索引重建语句,将索引重新编译后,数据库恢复正常。
在Oracle的MOS上没有找到与之最相关的bug信息,反而是找到了一个相关性很高的补丁信息,而对应的bug信息并没有公开:补丁程序13737888: ONLINE DDL:ORA-600[KGHSTACK_UNDERFLOW_INTERNAL_3], [0X2B7F4E1E7678], [KEYBUF]。