问题描述
这两个错误是由同一个BUG导致的。
数据库环境11.2.0.2 RAC for Solaris sparc,错误信息如下:
2012-01-29 06:15:10.168000 +08:00 Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc (incident=384590): ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], [] Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_384590/orcl1_lms3_81_i384590.trc USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. 2012-01-29 06:15:11.923000 +08:00 Dumping diagnostic DATA IN directory=[cdmp_20120129061511], requested BY (instance=1, osid=81 (LMS3)), summary=[incident=384590]. Sweep [inc][384590]: completed Sweep [inc2][384590]: completed 2012-01-29 06:15:17.289000 +08:00 Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc: ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], [] LMS3 (ospid: 81): terminating the instance due TO error 484 2012-01-29 06:15:20.910000 +08:00 ORA-1092 : opitsk aborting process 2012-01-29 06:15:22.384000 +08:00 . . . 2012-04-17 04:26:44.373000 +08:00 Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc (incident=432578): ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], [] Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_432578/orcl1_lms1_8678_i432578.trc USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. 2012-04-17 04:26:45.864000 +08:00 Dumping diagnostic DATA IN directory=[cdmp_20120417042645], requested BY (instance=1, osid=8678 (LMS1)), summary=[incident=432578]. Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc: ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], [] 2012-04-17 04:26:47.359000 +08:00 Sweep [inc][432578]: completed Sweep [inc2][432578]: completed 2012-04-17 04:26:53.095000 +08:00 Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc: ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], [] LMS1 (ospid: 8678): terminating the instance due TO error 484 2012-04-17 04:26:56.593000 +08:00 ORA-1092 : opitsk aborting process 2012-04-17 04:26:58.088000 +08:00 Instance TERMINATED BY LMS1, pid = 8678
专家解答
可以看到,无论是kjbrref:pkey错误的出现还是kjbmprlst:shadow错误的出现,都直接导致了实例的CRASH。可以说这两个错误都是非常严重的问题。而且二者都发生在LMSn进程上。
*** 2012-01-29 06:15:10.194 *** SESSION ID:(1009.1) 2012-01-29 06:15:10.194 *** CLIENT ID:() 2012-01-29 06:15:10.194 *** SERVICE NAME:(SYS$BACKGROUND) 2012-01-29 06:15:10.194 *** MODULE NAME:() 2012-01-29 06:15:10.194 *** ACTION NAME:() 2012-01-29 06:15:10.194 Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], [] ========= Dump FOR incident 384590 (ORA 600 [kjbrref:pkey]) ======== ----- Beginning of Customized Incident Dump(s) ----- GCS RESOURCE 0xb92d0cfa0 hashq [0xbb35eddc8,0xc0f9b1f60] name[0x511ed.ca] pkey 136931.0 GRANT 0xb94a7e8f8 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536 flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0 xid 0x0000.000.00000000 sid 3 pkwait 0s rmacks 0 refpcnt 0 weak: 0x0000.00000000 pkey 136931.0 hv 91 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 12, dom 0] kjga st 0x4, step 0.35.0, cinc 18, rmno 6345, flags 0x20 lb 16384, hb 32767, myb 16957, drmb 16957, apifrz 1 GCS SHADOW 0xb94a7e8f8,626 resp[0xb92d0cfa0,0x511ed.ca] pkey 136931.0 GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL master 1 owner 2 sid 3 remote[0x68fde3ef0,11] hist 0x10c30086180431f history 0x1f.0x6.0x1.0xc.0x6.0x1.0xc.0x6.0x1.0x0. cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0 disk: 0x0000.00000000 WRITE request: 0x0000.00000000 pi scn: 0x0000.00000000 sq[0xb92d0cfd0,0xb92d0cfd0] msgseq 0x1 updseq 0x0 reqids[11,0,0] infop 0x0 lockseq x67d9 GCS SHADOW END GCS RESOURCE END ----- End of Customized Incident Dump(s) ----- *** 2012-01-29 06:15:10.261 dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0) ----- SQL Statement (None) ----- CURRENT SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst1()+96 CALL skdstdst() FFFFFFFF7FFF4C00 ? 100670460 ? 000000000 ? 00000000A ? 000000001 ? 10BD552E0 ? ksedst()+60 CALL ksedst1() 000000000 ? 000000001 ? 00010C1D1 ? 00010C000 ? 10C1CA000 ? 00010C1CA ? dbkedDefDump()+2032 CALL ksedst() 000000000 ? 10B21A000 ? 10B21AA90 ? 10C1D2000 ? 00010B000 ? 00010C1D2 ? dbgexPhaseII()+1800 PTR_CALL dbkedDefDump() 000000003 ? 000000002 ? 10A6ABAA8 ? 0000014B0 ? 10C1C9000 ? 000000003 ? dbgexExplicitEndInc CALL dbgexPhaseII() 10C373D30 ? ()+728 FFFFFFFF7A634920 ? FFFFFFFF7FFF8FDC ? 0018E0001 ? 10A6A2D98 ? 000001C00 ? dbgeEndDDEInvocatio CALL dbgexExplicitEndInc 10A6A2C50 ? nImpl()+704 () FFFFFFFF7A634920 ? FFFFFFFF7FFF8F28 ? FFFFFFFF7FFFC620 ? 000000000 ? FFFFFFFFFE4E26A0 ? kjbrref()+1496 CALL dbgeEndDDEInvocatio 10C373D30 ? 001B1D800 ? n() FFFFFFFFFEC0AF31 ? FFFFFFFF7FFFC620 ? 000002868 ? 0018E0001 ? kjblreplay()+7380 CALL kjbrref() 000002868 ? 10C1CA3E0 ? 000021768 ? A681AFA10 ? B92D0CFA0 ? C0F96F920 ? kjbldrmrpst()+4864 CALL kjblreplay() 000000000 ? 000000001 ? 10C1CA0A0 ? BDA03C9B8 ? 000000000 ? 10C1E8890 ? kjmprcfgsync()+1424 CALL kjbldrmrpst() A681AFA10 ? 000000001 ?
另一个trace文件:
*** 2012-04-17 04:26:44.389 *** SESSION ID:(673.1) 2012-04-17 04:26:44.389 *** CLIENT ID:() 2012-04-17 04:26:44.389 *** SERVICE NAME:(SYS$BACKGROUND) 2012-04-17 04:26:44.389 *** MODULE NAME:() 2012-04-17 04:26:44.389 *** ACTION NAME:() 2012-04-17 04:26:44.389 Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], [] ========= Dump FOR incident 432578 (ORA 600 [kjbmprlst:shadow]) ======== ----- Beginning of Customized Incident Dump(s) ----- FUSION MSG 0xffffffff79c40b80,39 FROM 2 spnum 14 ver[38,11161] ln 144 sq[2,8] REPLAY 1 [0x103699.c7, 151132.0] c[0x7e7bd3240,55] [0x494e,x38] GRANT 2 CONVERT 0 ROLE x0 pi [0x0.0x0] flags 0x0 state 0x100 disk scn 0x0.0 writereq scn 0x0.0 rreqid x0 msgRM# 11161 bkt# 18131 drmbkt# 18131 pkey 151132.0 undo 0 stat 5 masters[32768, 2->32768] reminc 38 RM# 11152 flg x0 TYPE x0 afftime x8517cf38 nreplays BY lms 0 = 4046 nreplays BY lms 1 = 4105 nreplays BY lms 2 = 4176 nreplays BY lms 3 = 4214 nreplays BY lms 4 = 4158 nreplays BY lms 5 = 4162 hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0] kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20 lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1 FUSION MSG DUMP END GCS RESOURCE 0xbb93a40e8 hashq [0xba8f40298,0xc27d16700] name[0x103699.c7] pkey 151008.0 GRANT 0xb99d64f38 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536 flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0 xid 0x0000.000.00000000 sid 1 pkwait 0s rmacks 0 refpcnt 0 weak: 0x0000.00000000 pkey 151008.0 hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0] kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20 lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1 GCS SHADOW 0xb99d64f38,42 resp[0xbb93a40e8,0x103699.c7] pkey 151008.0 GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL master 1 owner 2 sid 1 remote[0x85fed2220,13] hist 0xb93e302087234c9f history 0x1f.0x19.0xd.0x39.0x8.0x4.0xc.0x1f.0x39.0x1. cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0 disk: 0x0000.00000000 WRITE request: 0x0000.00000000 pi scn: 0x0000.00000000 sq[0xbb93a4118,0xbb93a4118] msgseq 0x1 updseq 0x0 reqids[13,0,0] infop 0x0 lockseq xf0d1 GCS SHADOW END GCS RESOURCE END ----- End of Customized Incident Dump(s) ----- *** 2012-04-17 04:26:44.478 dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0) ----- SQL Statement (None) ----- CURRENT SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst1()+96 CALL skdstdst() FFFFFFFF7FFF4D20 ? 100670460 ? 000000000 ? 00000000A ? 000000001 ? 10BD552E0 ? ksedst()+60 CALL ksedst1() 000000000 ? 000000001 ? 00010C1D1 ? 00010C000 ? 10C1CA000 ? 00010C1CA ? dbkedDefDump()+2032 CALL ksedst() 000000000 ? 10B21A000 ? 10B21AA90 ? 10C1D2000 ? 00010B000 ? 00010C1D2 ? dbgexPhaseII()+1800 PTR_CALL dbkedDefDump() 000000003 ? 000000002 ? 10A6ABAA8 ? 0000014B0 ? 10C1C9000 ? 000000003 ? dbgexExplicitEndInc CALL dbgexPhaseII() 10C373D30 ? ()+728 FFFFFFFF7A634920 ? FFFFFFFF7FFF90FC ? 0018E0001 ? 10A6A2D98 ? 000001C00 ? dbgeEndDDEInvocatio CALL dbgexExplicitEndInc 10A6A2C50 ? nImpl()+704 () FFFFFFFF7A634920 ? FFFFFFFF7FFF9048 ? FFFFFFFF7FFFC740 ? 000000000 ? FFFFFFFFFE4E26A0 ? kjbmprlst()+13504 CALL dbgeEndDDEInvocatio 10C373D30 ? 001B1D800 ? n() FFFFFFFFFEC0AF31 ? FFFFFFFF7FFFC740 ? 0013F5000 ? 0018E0001 ? kjmxmpm()+796 PTR_CALL kjbmprlst() 101782000 ? 00010C1CA ? 10C1EA000 ? 10C1CA000 ? 10A6A3000 ? 10A6A3000 ? kjmpbmsg()+4584 CALL kjmxmpm() 00010A400 ? 000000000 ? 0852DA2C5 ? 00010C000 ? 10A7EE000 ? BE22AF0C0 ? kjmsm()+11308 CALL kjmpbmsg() 00010A400 ? 00000009C ? 00010C000 ? 10A7EE000 ? 000000001 ? 000000027 ? ksbrdp()+1236 PTR_CALL kjmsm() 000001888 ? 25916872D1 ? 000002000 ? 000000000 ? 00000024B ? 000001000 ? opirip()+1008 CALL ksbrdp() 10BB56000 ? BD8C0B680 ? 000000001 ? 000001400 ? 00010B800 ? 10AC212D8 ? opidrv()+780 CALL opirip() 10A6A3000 ? 380013D50 ? 000380002 ? 3800055C0 ? 380002000 ? 00010C000 ? sou2o()+92 CALL opidrv() 000000032 ? 000000004 ? FFFFFFFF7FFFF780 ? 0001EA190 ? FFFFFFFF7AF42F10 ? FFFFFFFF7FFFFBB8 ? opimai_real()+516 CALL sou2o() FFFFFFFF7FFFF758 ?
可以看到,两个TRACE文件也非常接近,而且连报错的前几个堆栈函数的名称都完全一样。
查询MOS,确认为Bug 12834027 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking,这个问题在最新的11.2.0.3.1PSU中被FIXED,除了打补丁之外,还可以考虑通过隐含参数”_gc_read_mostly_locking”=FALSE来禁止READ-MOSTLY OBJECT LOCKING。此外,禁止DRM也可以避免该错误的产生。
最后修改时间:2019-04-18 11:19:41
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。