问题描述
早上同事发emai,提到一个双节点rac,其中某节点被重启了,如下:
###### 1节点 02:23:55 2011 ###### Sat Dec 3 02:23:55 2011 Errors in file /oracle/admin/crmdb/bdump/crmdb1_pmon_12765.trc: ORA-00469: CKPT process terminated with error Sat Dec 3 02:23:55 2011 ORA-469 encountered when generating server alert SMG-3503 Sat Dec 3 02:23:55 2011 Errors in file /oracle/admin/crmdb/bdump/crmdb1_j000_8539.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-00469: CKPT process terminated with error Sat Dec 3 02:23:56 2011
###### 1节点crash ###### Sat Dec 3 02:23:57 2011 Errors in file /oracle/admin/crmdb/bdump/crmdb1_smon_12876.trc: ORA-00469: CKPT process terminated with error Sat Dec 3 02:23:58 2011 Shutting down instance (abort) License high water mark = 55 Sat Dec 3 02:24:00 2011
专家解答
从上面来看,由于检查点进程ckpt出现问题,导致实例crash。
###### 1节点pmon进程trace如下:###### *** 2011-12-03 02:23:55.731 Background process CKPT found dead Oracle pid = 24 OS pid (from detached process) = 12869 OS pid (from process state) = 12869 dtp = c000000040016e40, proc = c0000004950057c8 Dump of memory from 0xC000000040016E40 to 0xC000000040016E88 C000000040016E40 00000076 00000000 C0000004 950057C8 [...v..........W.] C000000040016E50 00000000 00000000 00000000 434B5054 [............CKPT] C000000040016E60 00020000 00000000 00003245 00000000 [..........2E....] .................... .................... .................... .................... Repeat 13 times C000000495005CF0 6F726163 6C650000 00000000 00000000 [oracle..........] C000000495005D00 00000000 00000000 00000000 00000000 [................] C000000495005D10 00000000 00000006 6A6C6372 6D310000 [........jlcrm1..] C000000495005D20 00000000 00000000 00000000 00000000 [................] Repeat 2 times C000000495005D50 00000000 00000000 00000000 00000006 [................] C000000495005D60 554E4B4E 4F574E00 00000000 00000000 [UNKNOWN.........] C000000495005D70 00000000 00000000 00000000 00000000 [................] C000000495005D80 00000000 00000008 31323836 39000000 [........12869...] C000000495005D90 00000000 00000000 00000000 00000000 [................] C000000495005DA0 00000000 00000005 6F726163 6C65406A [........oracle@j] C000000495005DB0 6C63726D 31202843 4B505429 00000000 [lcrm1 (CKPT)....] C000000495005DC0 00000000 00000000 00000000 00000000 [................] .................... .................... .................... .................... C000000495005FA0 00000000 00000000 00000000 00001308 [................] C000000495005FB0 00000006 00000000 [........] error 469 detected in background process ORA-00469: CKPT process terminated with error *** 2011-12-03 02:24:07.798 ksuitm: waiting up to [5] seconds before killing DIAG
经同事确认,diag trace,甚至ckpt trace都没用生成,跟bug 10008092描述十分相似,
包括版本,diagnostic analysis 都十分吻合,大概情况如下:
ckpt 进程死掉(可能是hang) --> pmon cleanup --> 保护后台进程,pmon crash instance
对于 alert 中的如下信息就非常容易解释了:
*** SESSION ID:(1089.34028) 2011-12-03 02:23:56.172 kgefec: fatal error 0 *** 2011-12-03 02:23:56.172 ksedmp: internal or fatal error ORA-00603: ORACLE server session terminated by fatal error ORA-00449: background process 'LCK0' unexpectedly terminated with error 469 ORA-00469: CKPT process terminated with error ORA-00469: CKPT process terminated with error Current SQL statement for this session: TRUNCATE TABLE DINF.TEMP1_IN_PDT_CM_USER ----- PL/SQL Call Stack ----- object line object handle number name c00000006e89ab60 200 procedure DINF.P_IN_PDT_CM_USER c00000009e142850 1 anonymous block ----- Call Stack Trace -----
为什么这么说呢?因为truncate table是要触发object checkpoint的。 该bug如下: <span style="color: #0000ff;"> Bug 10008092: INSTANCE CRASH WITH ORA-00469: CKPT PROCESS TERMINATED WITH ERROR </span>
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。