上周在更换机房,停数据库时,遇到的另一个ora-600 问题,记录一下
环境
os version :centos 5
db version :oracle 10204 single-instance
physical dataguard(异地)
描述一下当时情况
1,23:30 网管断开外网 --注意这里导地dataguard 就已无法连接
2,00:22 发出shutdown immediate
3,00:37 第一次出现ora-600 [3708]
4, 00:39 左右通过OS kill oracle process
5, 随后ipcrm 删除了共享内存段
6,再次startup 确认可以打开后进行了shutdown immediate没再出现ora-600
SQL> shutdown immediate
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
cause:
Basically this error is raised because LGWR timmed out.
While shutdown, Oracle routine (kcttsc()) sends a message to LGWR to change the state of a redo thread and waits for confirmation. If the return message never comes, then LGWR
times out after 15 minutes which is the second argument in the ora-600 (910 secs ie. 15mins.) In the situation we faced , lgwr had RT enqueue and waiting for 'direct path read'
on file 197. Becuase of some OS or network issues the read took more than 15 minutes and lgwr gave a timeout.
this is a bug 6512622
solution:
1. Apply the 10.2.0.5 patchset where the bug is fixed
or
2. Apply one off Patch 6512622 if available on My Oracle Support for your platform and Oracle Version.
or
3. Upgrade to 11.1.0.6 where the bug is fixed.
I think the improved I/O performance or shutdown the database before off network Also the occurrence of this problem can be avoided.
环境
os version :centos 5
db version :oracle 10204 single-instance
physical dataguard(异地)
描述一下当时情况
1,23:30 网管断开外网 --注意这里导地dataguard 就已无法连接
2,00:22 发出shutdown immediate
3,00:37 第一次出现ora-600 [3708]
4, 00:39 左右通过OS kill oracle process
5, 随后ipcrm 删除了共享内存段
6,再次startup 确认可以打开后进行了shutdown immediate没再出现ora-600
SQL> shutdown immediate
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
alert log file
#############################
Fri Aug 31 23:49:11 2012
Errors in file /oracle/admin/icme/bdump/icme_lns1_18491.trc:
ORA-03113: end-of-file on communication channel
Fri Aug 31 23:49:11 2012
LGWR: I/O error 3113 archiving log 2 to 'sdicme'
Sat Sep 1 00:22:11 2012
Starting background process EMN0
EMN0 started with pid=363, OS id=31070
Sat Sep 1 00:22:11 2012
Shutting down instance: further logons disabled
Sat Sep 1 00:22:11 2012
Stopping background process CJQ0
Sat Sep 1 00:22:11 2012
Stopping background process MMNL
Sat Sep 1 00:22:11 2012
Stopping background process MMON
Sat Sep 1 00:22:12 2012
Shutting down instance (immediate)
License high water mark = 1220
Sat Sep 1 00:22:12 2012
Stopping Job queue slave processes, flags = 7
Sat Sep 1 00:22:12 2012
Job queue slave processes stopped
All dispatchers and shared servers shutdown
Sat Sep 1 00:22:20 2012
ALTER DATABASE CLOSE NORMAL
Sat Sep 1 00:22:20 2012
SMON: disabling tx recovery
SMON: disabling cache recovery
Sat Sep 1 00:22:22 2012
Shutting down archive processes
Archiving is disabled
Sat Sep 1 00:22:27 2012
ARCH shutting down
ARC9: Archival stopped
Sat Sep 1 00:22:32 2012
ARCH shutting down
ARC8: Archival stopped
Sat Sep 1 00:22:37 2012
ARCH shutting down
ARC7: Archival stopped
Sat Sep 1 00:22:42 2012
ARCH shutting down
ARC6: Archival stopped
Sat Sep 1 00:22:47 2012
ARCH shutting down
ARC5: Archival stopped
Sat Sep 1 00:22:52 2012
ARCH shutting down
ARC4: Archival stopped
Sat Sep 1 00:22:57 2012
ARCH shutting down
ARC3: Archival stopped
Sat Sep 1 00:23:07 2012
ARCH shutting down
ARC1: Archival stopped
Sat Sep 1 00:23:12 2012
ARCH shutting down
ARC0: Archival stopped
Sat Sep 1 00:37:35 2012
Errors in file /oracle/admin/icme/bdump/icme_lgwr_29039.trc:
Sat Sep 1 00:37:39 2012
Errors in file /oracle/admin/icme/udump/icme_ora_31068.trc:
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Sat Sep 1 00:37:43 2012
CLOSE: Error 600 during database close
Sat Sep 1 00:37:43 2012
ARC1: Archival stopped
Sat Sep 1 00:23:12 2012
ARCH shutting down
ARC0: Archival stopped
Sat Sep 1 00:37:35 2012
Errors in file /oracle/admin/icme/bdump/icme_lgwr_29039.trc:
Sat Sep 1 00:37:39 2012
Errors in file /oracle/admin/icme/udump/icme_ora_31068.trc:
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Sat Sep 1 00:37:43 2012
CLOSE: Error 600 during database close
Sat Sep 1 00:37:43 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Sep 1 00:37:43 2012
ORA-600 signalled during: ALTER DATABASE CLOSE NORMAL...
Sat Sep 1 00:38:52 2012
Thread 1 closed at log sequence 33821
Successful close of redo thread 1
Sat Sep 1 00:40:04 2012
Errors in file /oracle/admin/icme/bdump/icme_pmon_29029.trc:
ORA-00476: RECO process terminated with error
Sat Sep 1 00:40:04 2012
PMON: terminating instance due to error 476
Instance terminated by PMON, pid = 29029
trace file contents
##################################################
*** 2012-09-01 00:37:39.683
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE CLOSE NORMAL
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
kgeriv()+176 call ksfdmp() 000000003 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
kgesiv()+119 call kgeriv() 0066876E0 ? 006730B90 ?
000000000 ? 000000000 ?
7FBFFF24A0 ? 000000000 ?
ksesic1()+215 call kgesiv() 0066876E0 ? 006730B90 ?
000000E7C ? 000000001 ?
7FBFFF3280 ? 000000000 ?
kcttsc()+695 call ksesic1() 000000E7C ? 000000000 ?
00000038E ? 000000001 ?
000000000 ? 7FBFFF2F00 ?
kcfcld()+145 call kcttsc() 000000003 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
dbsclose()+498 call kcfcld() 000000003 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
adbdrv()+63033 call dbsclose() 000000000 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiexe()+13505 call adbdrv() 000000000 ? 000000000 ?
2393BC048 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiosq0()+3316 call opiexe() 000000004 ? 000000000 ?
7FBFFFB0F8 ? 000000003 ?
000000000 ?
FFFFFFFF000000BD ?
kpooprx()+315 call opiosq0() 000000003 ? 00000000E ?
7FBFFFB268 ? 0000000A4 ?
000000000 ?
FFFFFFFF000000BD ?
kpoal8()+799 call kpooprx() 7FBFFFE414 ? 7FBFFFC440 ?
00000001B ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiodr()+984 call kpoal8() 00000005E ? 000000017 ?
7FBFFFE410 ? 000000001 ?
000000001 ?
FFFFFFFF000000BD ?
ttcpip()+1012 call opiodr() 00000005E ? 000000017 ?
7FBFFFE410 ? 000000000 ?
0059B1290 ?
...
check archivelog info at that time
SQL> select * from (
select dest_id,sequence#,first_time,next_time,creator,standby_dest from v$archived_log where sequence#>33818 order by 2
3 ) where rownum<10;
DEST_ID SEQUENCE# FIRST_TIME NEXT_TIME CREATOR STA
---------- ---------- ------------------- ------------------- ------- ---
2 33819 2012-08-31 11:04:47 2012-08-31 15:41:31 LGWR YES
1 33819 2012-08-31 11:04:47 2012-08-31 15:41:31 ARCH NO
1 33820 2012-08-31 15:41:31 2012-08-31 22:00:06 ARCH NO
2 33820 2012-08-31 15:41:31 2012-08-31 22:00:06 LGWR YES
1 33821 2012-08-31 22:00:06 2012-09-01 00:46:31 ARCH NO
1 33822 2012-09-01 00:46:31 2012-09-01 05:47:58 ARCH NO
1 33823 2012-09-01 05:47:58 2012-09-01 06:19:14 ARCH NO
1 33824 2012-09-01 06:19:14 2012-09-01 06:19:23 ARCH NO
1 33825 2012-09-01 06:19:23 2012-09-01 06:19:39 ARCH NO
9 rows selected.
cause:
Basically this error is raised because LGWR timmed out.
While shutdown, Oracle routine (kcttsc()) sends a message to LGWR to change the state of a redo thread and waits for confirmation. If the return message never comes, then LGWR
times out after 15 minutes which is the second argument in the ora-600 (910 secs ie. 15mins.) In the situation we faced , lgwr had RT enqueue and waiting for 'direct path read'
on file 197. Becuase of some OS or network issues the read took more than 15 minutes and lgwr gave a timeout.
this is a bug 6512622
solution:
1. Apply the 10.2.0.5 patchset where the bug is fixed
or
2. Apply one off Patch 6512622 if available on My Oracle Support for your platform and Oracle Version.
or
3. Upgrade to 11.1.0.6 where the bug is fixed.
I think the improved I/O performance or shutdown the database before off network Also the occurrence of this problem can be avoided.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




