暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PMON crash the instance of RAC

原创 Roger 2015-02-11
1291
这是某个网友的问题,其一套rac的某个节点被重启了,通过分析日志我们可以看到其中一个rac节点crash了,如下:
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc  (incident=584434):
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584434/cspora2_pmon_57410148_i584434.trc
Thu Feb 05 10:24:04 2015
Trace dumping is performing id=[cdmp_20150205102404]
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc (incident=584435):
ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584435/cspora2_pmon_57410148_i584435.trc
Thu Feb 05 10:24:06 2015
Sweep Incident[584434]: completed
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc:
ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 57410148): terminating the instance due to error 472
Thu Feb 05 10:24:18 2015
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by PMON, pid = 57410148
Thu Feb 05 10:24:58 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /dev/rlv_spfile
.......
.......
ORA-1154 signalled during: alter database open...
Thu Feb 05 10:31:11 2015
Reconfiguration started (old inc 102, new inc 104)
List of nodes:
1
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu Feb 05 10:31:12 2015
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Feb 05 10:31:12 2015

从上述alert log可以看出,数据库实例是被pmon进程强行终止的,之所以pmon进程会强行终止instance,
很明显之前出现了ORA-00600错误。很明显上述ORA-00600错误是pmon进程出现的。我们来分析pmon进程的trace内容:
*** ACTION NAME:() 2015-02-05 10:24:01.185

Dump continued from file: /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 584434 (ORA 600 [kjucvl:!busy]) ========
----- Beginning of Customized Incident Dump(s) -----
----------enqueue 7000001cbbcdaf0------------------------
lock version : 311947
Owner node : 1
grant_level : KJUSEREX
req_level : KJUSEREX
bast_level : KJUSERNL
notify_func : 0
resp : 70000018f124ac0
procp : 7000001cb0e3960
pid : 46924312
proc version : 922
oprocp : 0
opid : 0
group lock owner : 0
xid : 0000-0000-00000000
dd_time : 0.0 secs
dd_count : 0
timeout : 0.0 secs
On_timer_q? : N
On_dd_q? : N
lock_state : CONVERTING
Open Options : KJUSERPROCESS_OWNED
Convert options : KJUSERGETVALUE KJUSERNOQUEUE KJUSEREXPRESS KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK
History : 0x955252b5
Msg_Seq : 0x0
res_seq : 41
valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
----------resource 70000018f124ac0----------------------
resname : [0x970d5d33][0x0],[DX]
Local node : 1
dir_node : 0
master_node : 0
hv idx : 18
hv last r.inc : 98
current inc : 98
hv status : 0
hv master : 0
open options :
Held mode : KJUSEREX
Cvt mode : KJUSERNL
Next Cvt mode : KJUSERNL
msg_seq : 1
res_seq : 41
grant_bits : KJUSERNL KJUSEREX
grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX
count : 1 0 0 0 0 1
val_state : KJUSERVS_VALUE
valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
access_node : 1
vbreq_state : 0
state : x8
resp : 70000018f124ac0
On Scan_q? : N
Total accesses: 5475
Imm. accesses: 5057
Granted_locks : 2
Cvting_locks : 0
value_block: 51 4e 58 44 97 0d 5d 33 d1 56 c8 d8 00 04 ac 1e
GRANTED_Q :
lp 7000001cbbcdaf0 gl KJUSEREX rp 70000018f124ac0 [0x970d5d33][0x0],[DX]
master 0 pid 46924312 bast 0 rseq 41 mseq 0 history 0x955252b5
open opt KJUSERPROCESS_OWNED
CONVERT_Q:
----- End of Customized Incident Dump(s) -----

*** 2015-02-05 10:24:01.286
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+002c bl 105e6a628
ksedst1()+0064 bl 101f9c14c
ksedst()+0028 bl ksedst1() FFFFFFFFFFF4880 ? 000000000 ?
dbkedDefDump()+0874 bl ksedst() 110EA2180 ?
ksedmp()+0048 bl dbkedDefDump() 3FFFF51D0 ? 110000A90 ?
ksfdmp()+0058 bl ksedmp() 000000020 ?
dbgexPhaseII()+0130 bl _ptrgl()
dbgexExplicitEndInc bl dbgexPhaseII() 000000000 ? 000000000 ?
()+0210 000000000 ?
dbgeEndDDEInvocatio bl dbgexExplicitEndInc 110378410 ? 11037F370 ?
nImpl()+0224 ()
dbgeEndDDEInvocatio bl dbgeEndDDEInvocatio 110378410 ? 11037F370 ?
n()+0038 nImpl()
kjucvl()+0590 bl 101f9d824
kjuscv()+0c90 bl kjucvl() 000000002 ? FFFFFFFFFFF9700 ?
FFFFFFFFFFF9570 ?
24244228CB0E3960 ?
102196C80 ? FFFFFFFFFFF9220 ?
000000000 ? 110323C28 ?
ksicon_v()+06cc bl kjuscv() 7000001CBBCDAF0 ?
500000000000000 ?
79FFFF9D00 ? 0105358E0 ?
7000001BDF32FB4 ? 000000000 ?
000000000 ? 000000000 ?
k2qsod()+0250 bl ksicon_v() 1071DED20 ? 000000000 ?
000000000 ? 084242240 ?
70000017FEF01A0 ? 000000046 ?
70000017FEF01C0 ?
kssxdl()+076c bl _ptrgl()
kssdel()+0048 bl kssxdl() 7000001BDF32F38 ? 382242228 ?
kssdch()+0d30 bl kssdel() FFFFFFFFFFFA550 ?
4224822800000003 ?
ksuxds()+08b8 bl kssdch() 3CD1FEAE8 ? 7000001CD3ACCA0 ?
kssxdl()+076c bl _ptrgl()
kssdel()+0048 bl kssxdl() 7000001CD3ACCA0 ? 3CE4C7B78 ?
kssdch()+0d30 bl kssdel() FFFFFFFFFFFF5D0 ?
FFFFFFFFFFFF5E0 ?
ksudlp()+0180 bl kssdch() 000000000 ? 000000000 ?
kssxdl()+076c bl _ptrgl()
kssdel()+0048 bl kssxdl() 7000001CF9B5E48 ? 300000000 ?
ksuxdl()+029c bl kssdel() 033339530 ? 093339530 ?
ksuxda()+02ac bl ksuxdl() 7000001CF9B5E48 ? 0FFFFC1D0 ?
ksucln()+0938 bl ksuxda()
ksbrdp()+075c bl _ptrgl()
opirip()+0444 bl 105e65300
opidrv()+0418 bl opirip() 732F6373706F7261 ?
410323C28 ? FFFFFFFFFFFF420 ?
sou2o()+0090 bl opidrv() 3204C0756C ? 4A0071E60 ?
FFFFFFFFFFFF420 ?
opimai_real()+0148 bl 101f9be30
main()+0090 bl opimai_real() 000000000 ? 000000000 ?
__start()+0070 bl main() 000000000 ? 000000000 ?



从上述信息来看,出现了DX锁,对于DX锁,这是一种分布式事务锁。通过分析call stack,我们可以定位到
该问题其实是一个Oracle bug:11868640。

通过对比Bug 11868640 - ORA-600 [kjucvl:!busy] possible in PMON in RAC (文档 ID 11868640.8) ,
我们可以发现pmon trace中的call stack与上述bug描述完全一致。实际上该问题,可以通过安装patch或psu来解决。

很简单的一个问题,供大家参考!
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论