暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

BUG 10008092 caused instance crash

原创 Roger 2011-12-05
854

早上同事发emai,提到一个双节点rac,其中某节点被重启了,如下:



###### 1节点 02:23:55 2011 ######

Sat Dec 3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_pmon_12765.trc:
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:55 2011
ORA-469 encountered when generating server alert SMG-3503
Sat Dec 3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_j000_8539.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:56 2011


###### 1节点crash ######

Sat Dec 3 02:23:57 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_smon_12876.trc:
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:58 2011
Shutting down instance (abort)
License high water mark = 55
Sat Dec 3 02:24:00 2011



从上面来看,由于检查点进程ckpt出现问题,导致实例crash。



###### 1节点pmon进程trace如下:######

*** 2011-12-03 02:23:55.731
Background process CKPT found dead
Oracle pid = 24
OS pid (from detached process) = 12869
OS pid (from process state) = 12869
dtp = c000000040016e40, proc = c0000004950057c8
Dump of memory from 0xC000000040016E40 to 0xC000000040016E88
C000000040016E40 00000076 00000000 C0000004 950057C8 [...v..........W.]
C000000040016E50 00000000 00000000 00000000 434B5054 [............CKPT]
C000000040016E60 00020000 00000000 00003245 00000000 [..........2E....]
....................
....................
....................
....................
Repeat 13 times
C000000495005CF0 6F726163 6C650000 00000000 00000000 [oracle..........]
C000000495005D00 00000000 00000000 00000000 00000000 [................]
C000000495005D10 00000000 00000006 6A6C6372 6D310000 [........jlcrm1..]
C000000495005D20 00000000 00000000 00000000 00000000 [................]
Repeat 2 times
C000000495005D50 00000000 00000000 00000000 00000006 [................]
C000000495005D60 554E4B4E 4F574E00 00000000 00000000 [UNKNOWN.........]
C000000495005D70 00000000 00000000 00000000 00000000 [................]
C000000495005D80 00000000 00000008 31323836 39000000 [........12869...]
C000000495005D90 00000000 00000000 00000000 00000000 [................]
C000000495005DA0 00000000 00000005 6F726163 6C65406A [........oracle@j]
C000000495005DB0 6C63726D 31202843 4B505429 00000000 [lcrm1 (CKPT)....]
C000000495005DC0 00000000 00000000 00000000 00000000 [................]
....................
....................
....................
....................
C000000495005FA0 00000000 00000000 00000000 00001308 [................]
C000000495005FB0 00000006 00000000 [........]

error 469 detected in background process
ORA-00469: CKPT process terminated with error
*** 2011-12-03 02:24:07.798
ksuitm: waiting up to [5] seconds before killing DIAG



经同事确认,diag trace,甚至ckpt trace都没用生成,跟bug 10008092描述十分相似,
包括版本,diagnostic analysis 都十分吻合,大概情况如下:

ckpt 进程死掉(可能是hang) --> pmon cleanup --> 保护后台进程,pmon crash instance

对于 alert 中的如下信息就非常容易解释了:



*** SESSION ID:(1089.34028) 2011-12-03 02:23:56.172
kgefec: fatal error 0
*** 2011-12-03 02:23:56.172
ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LCK0' unexpectedly terminated with error 469
ORA-00469: CKPT process terminated with error
ORA-00469: CKPT process terminated with error
Current SQL statement for this session:
TRUNCATE TABLE DINF.TEMP1_IN_PDT_CM_USER
----- PL/SQL Call Stack -----
object line object
handle number name
c00000006e89ab60 200 procedure DINF.P_IN_PDT_CM_USER
c00000009e142850 1 anonymous block
----- Call Stack Trace -----


为什么这么说呢?因为truncate table是要触发object checkpoint的。

该bug如下:


Bug 10008092: INSTANCE CRASH WITH ORA-00469: CKPT PROCESS TERMINATED WITH ERROR


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论