ora-600 [3708] [910] internal error issue

原创 Anbob 2012-09-03

944

上周在更换机房，停数据库时，遇到的另一个ora-600 问题，记录一下
环境
os version :centos 5
db version :oracle 10204 single-instance
physical dataguard(异地)
描述一下当时情况
1，23:30 网管断开外网 --注意这里导地dataguard 就已无法连接
2，00:22 发出shutdown immediate
3，00:37 第一次出现ora-600 [3708]
4, 00:39 左右通过OS kill oracle process
5, 随后ipcrm 删除了共享内存段
6，再次startup 确认可以打开后进行了shutdown immediate没再出现ora-600
SQL> shutdown immediate
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []

alert log file
#############################
Fri Aug 31 23:49:11 2012
Errors in file /oracle/admin/icme/bdump/icme_lns1_18491.trc:
ORA-03113: end-of-file on communication channel
Fri Aug 31 23:49:11 2012
LGWR: I/O error 3113 archiving log 2 to 'sdicme'
Sat Sep  1 00:22:11 2012
Starting background process EMN0
EMN0 started with pid=363, OS id=31070
Sat Sep  1 00:22:11 2012
Shutting down instance: further logons disabled
Sat Sep  1 00:22:11 2012
Stopping background process CJQ0
Sat Sep  1 00:22:11 2012
Stopping background process MMNL
Sat Sep  1 00:22:11 2012
Stopping background process MMON
Sat Sep  1 00:22:12 2012
Shutting down instance (immediate)
License high water mark = 1220
Sat Sep  1 00:22:12 2012
Stopping Job queue slave processes, flags = 7
Sat Sep  1 00:22:12 2012
Job queue slave processes stopped
All dispatchers and shared servers shutdown
Sat Sep  1 00:22:20 2012
ALTER DATABASE CLOSE NORMAL
Sat Sep  1 00:22:20 2012
SMON: disabling tx recovery
SMON: disabling cache recovery
Sat Sep  1 00:22:22 2012
Shutting down archive processes
Archiving is disabled
Sat Sep  1 00:22:27 2012
ARCH shutting down
ARC9: Archival stopped
Sat Sep  1 00:22:32 2012
ARCH shutting down
ARC8: Archival stopped
Sat Sep  1 00:22:37 2012
ARCH shutting down
ARC7: Archival stopped
Sat Sep  1 00:22:42 2012
ARCH shutting down
ARC6: Archival stopped
Sat Sep  1 00:22:47 2012
ARCH shutting down
ARC5: Archival stopped
Sat Sep  1 00:22:52 2012
ARCH shutting down
ARC4: Archival stopped
Sat Sep  1 00:22:57 2012
ARCH shutting down
ARC3: Archival stopped
Sat Sep  1 00:23:07 2012
ARCH shutting down
ARC1: Archival stopped
Sat Sep  1 00:23:12 2012
ARCH shutting down
ARC0: Archival stopped
Sat Sep  1 00:37:35 2012
Errors in file /oracle/admin/icme/bdump/icme_lgwr_29039.trc:
Sat Sep  1 00:37:39 2012
Errors in file /oracle/admin/icme/udump/icme_ora_31068.trc:
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Sat Sep  1 00:37:43 2012
CLOSE: Error 600 during database close
Sat Sep  1 00:37:43 2012
ARC1: Archival stopped
Sat Sep  1 00:23:12 2012
ARCH shutting down
ARC0: Archival stopped
Sat Sep  1 00:37:35 2012
Errors in file /oracle/admin/icme/bdump/icme_lgwr_29039.trc:
Sat Sep  1 00:37:39 2012
Errors in file /oracle/admin/icme/udump/icme_ora_31068.trc:
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Sat Sep  1 00:37:43 2012
CLOSE: Error 600 during database close
Sat Sep  1 00:37:43 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Sep  1 00:37:43 2012
ORA-600 signalled during: ALTER DATABASE CLOSE NORMAL...
Sat Sep  1 00:38:52 2012
Thread 1 closed at log sequence 33821
Successful close of redo thread 1
Sat Sep  1 00:40:04 2012
Errors in file /oracle/admin/icme/bdump/icme_pmon_29029.trc:
ORA-00476: RECO process terminated with error
Sat Sep  1 00:40:04 2012
PMON: terminating instance due to error 476
Instance terminated by PMON, pid = 29029
trace file contents
##################################################
*** 2012-09-01 00:37:39.683
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE CLOSE NORMAL
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
kgeriv()+176         call     ksfdmp()             000000003 ? 000000001 ?
7FBFFF2500 ? 7FBFFF2560 ?
7FBFFF24A0 ? 000000000 ?
kgesiv()+119         call     kgeriv()             0066876E0 ? 006730B90 ?
000000000 ? 000000000 ?
7FBFFF24A0 ? 000000000 ?
ksesic1()+215        call     kgesiv()             0066876E0 ? 006730B90 ?
000000E7C ? 000000001 ?
7FBFFF3280 ? 000000000 ?
kcttsc()+695         call     ksesic1()            000000E7C ? 000000000 ?
00000038E ? 000000001 ?
000000000 ? 7FBFFF2F00 ?
kcfcld()+145         call     kcttsc()             000000003 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
dbsclose()+498       call     kcfcld()             000000003 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
adbdrv()+63033       call     dbsclose()           000000000 ? 000000000 ?
000000003 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiexe()+13505       call     adbdrv()             000000000 ? 000000000 ?
2393BC048 ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
7FBFFFB0F8 ? 000000003 ?
000000000 ?
FFFFFFFF000000BD ?
kpooprx()+315        call     opiosq0()            000000003 ? 00000000E ?
7FBFFFB268 ? 0000000A4 ?
000000000 ?
FFFFFFFF000000BD ?
kpoal8()+799         call     kpooprx()            7FBFFFE414 ? 7FBFFFC440 ?
00000001B ? 000000001 ?
000000000 ?
FFFFFFFF000000BD ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
7FBFFFE410 ? 000000001 ?
000000001 ?
FFFFFFFF000000BD ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
7FBFFFE410 ? 000000000 ?
0059B1290 ?
...
check archivelog info at that time
SQL> select * from (
select dest_id,sequence#,first_time,next_time,creator,standby_dest from v$archived_log where sequence#>33818  order by 2
3  ) where rownum<10;
DEST_ID  SEQUENCE# FIRST_TIME          NEXT_TIME           CREATOR STA
---------- ---------- ------------------- ------------------- ------- ---
2      33819 2012-08-31 11:04:47 2012-08-31 15:41:31 LGWR    YES
1      33819 2012-08-31 11:04:47 2012-08-31 15:41:31 ARCH    NO
1      33820 2012-08-31 15:41:31 2012-08-31 22:00:06 ARCH    NO
2      33820 2012-08-31 15:41:31 2012-08-31 22:00:06 LGWR    YES
1      33821 2012-08-31 22:00:06 2012-09-01 00:46:31 ARCH    NO
1      33822 2012-09-01 00:46:31 2012-09-01 05:47:58 ARCH    NO
1      33823 2012-09-01 05:47:58 2012-09-01 06:19:14 ARCH    NO
1      33824 2012-09-01 06:19:14 2012-09-01 06:19:23 ARCH    NO
1      33825 2012-09-01 06:19:23 2012-09-01 06:19:39 ARCH    NO
9 rows selected.

cause:
Basically this error is raised because LGWR timmed out.
While shutdown, Oracle routine (kcttsc()) sends a message to LGWR to change the state of a redo thread and waits for confirmation. If the return message never comes, then LGWR
times out after 15 minutes which is the second argument in the ora-600 (910 secs ie. 15mins.) In the situation we faced , lgwr had RT enqueue and waiting for 'direct path read'
on file 197. Becuase of some OS or network issues the read took more than 15 minutes and lgwr gave a timeout.
this is a bug 6512622
solution:
1. Apply the 10.2.0.5 patchset where the bug is fixed
or
2. Apply one off Patch 6512622 if available on My Oracle Support for your platform and Oracle Version.
or
3. Upgrade to 11.1.0.6 where the bug is fixed.
I think the improved I/O performance or shutdown the database before off network Also the occurrence of this problem can be avoided.

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

ora-600 [3708] [910] internal error issue

评论