About ora-00600 [4400] [48]

原创 Roger 2012-06-13

1426

上午同事让帮忙分析一个ora-600错误，用UE打开trace，看到如下错误：

Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /install1/oracle/bill1/product/9.2.0
System name:	AIX
Node name:	billing1
Release:	3
Version:	5
Machine:	00CB104D4C00
Instance name: bill1
Redo thread mounted by this instance: 1
Oracle process number: 148
Unix process pid: 1363982, image: oracle@billing1 (TNS V1-V3)

*** SESSION ID:(578.13852) 2012-06-04 12:08:44.492
*** 2012-06-04 12:08:44.492
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

可以看到，该系统是aix 5.3，db version是9208 单实例。
继续查看下面的call stack，内容如下：

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl       ksedst               1029555CC ?
ksfdmp+0018          bl       01FD46A8             
kgeriv+0118          bl       _ptrgl               
kgeasi+00cc          bl       kgeriv               000000002 ? 1100A2128 ?
                                                   102954750 ? 7000001B2B78F30 ?
                                                   000000000 ?
ktcddt+013c          bl       kgeasi               110006838 ? 1103923A8 ?
                                                   113000001130 ? 200000002 ?
                                                   100000001 ? 000000004 ?
                                                   000000030 ? 7000001B1B74EE8 ?
ktcsod+01f8          bl       ktcddt               110006838 ? 110006978 ?
                                                   000000000 ?
kssdch_stage+02b8    bl       _ptrgl               
kssdch+0014          bl       kssdch_stage         000000000 ? 700000000007D98 ?
                                                   110002F50 ?
ktcbod+030c          bl       kssdch               11000D618 ? 000000008 ?
kssdch_stage+02b8    bl       _ptrgl               
kssdch+0014          bl       kssdch_stage         110002F50 ? 110061758 ?
                                                   000000009 ?
ksuxds+1118          bl       kssdch               1000E82E4 ? 000000000 ?
ksudel+006c          bl       ksuxds               7000001AB716A20 ? 100000001 ?
opilof+03dc          bl       01FD4914             
opiodr+08cc          bl       _ptrgl               
ttcpip+0cc4          bl       _ptrgl               
opitsk+0d60          bl       ttcpip               11000D4C0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opiino+0758          bl       opitsk               000000000 ? 000000000 ?
opiodr+08cc          bl       _ptrgl               
opidrv+032c          bl       opiodr               3C00000018 ? 4101FAA40 ?
                                                   FFFFFFFFFFFF8F0 ? 0A0012010 ?
sou2o+0028           bl       opidrv               3C0C000000 ? 4A0059B20 ?
                                                   FFFFFFFFFFFF8F0 ?
main+0138            bl       01FD40E8             
__start+0098         bl       main                 000000000 ? 000000000 ?

mos上关于该错误的描述是这样的:
PURPOSE:            
  This article discusses the internal error "ORA-600 [4400]", what 
  it means and possible actions. The information here is only applicable 
  to the versions listed and is provided only for guidance.

ERROR:              
  ORA-600 [4400] [a] [b] [c] [d] [e]

VERSIONS:
  versions 6.0 to 11

DESCRIPTION:

  Internal error 4400 means that we are trying to delete a transaction (for 
  example at logoff time) but the transaction has not yet been marked 
  completed.

  This can happen at the remote site in a distributed transaction if the 
  first part of the first stage of a two phase commit gets an error before 
  it really starts the protocol.

FUNCTIONALITY:      
  TRANSACTION CONTROL

IMPACT:             
  PROCESS FAILURE - but only at logoff so minimal impact
  NON CORRUPTIVE - No underlying data corruption.

该文档描述说4400错误是跟分布式事务有关，曾经也遇到不少关于分布式事务的问题，以前也
写过一篇：ORA-01591: lock held by in-doubt distributed transaction

针对该错误，对比call stack 可以发现，基本上完全一致，该文档说该错误完全可以忽略，如下：

ORA-00600 [4400], [48], [], [], [], [] From a Distributed Transaction [ID 464861.1]

Symptoms
 The following error is reported on 9.2.0.5:

ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []

The call stack is:

ksedmp ksfdmp kgeriv kgeasi ktcddt ktcsod kssdch_stage kssdch ktcbod
kssdch_stage kssdch PGOSF40__ksuxds ksudel kxfprdp opirip opidrv sou2o

Cause
 The error is encountered due to Bug 3840810 which was fixed in version 10.1.0.3. 

The error is encountered when there is a dblink between 8i and 9i/10g databases. This error is only raised 
in the log-off of the local session while trying to delete a transaction but the transaction has not yet 
been marked completed. This lack of information is caused by the bug and if there is no process failure due
to this error, it can be ignored since there is no SQL statement/session affected.

This bug has been fixed by architectural changes in 10g and unfortunately is not backportable to 9.2.

If this is an one time occurrence then it can be safely ignored.

我们可以从trace里面找到如下信息：

BH (0x700000134fdd900) file#: 405 rdba: 0x65403498 (405/13464) class 1 ba: 0x700000134764000
  set: 84 dbwrid: 3 obj: 343488 objn: 343488
  hash: [700000070fead00,700000163feec00] lru: [7000000e8fe6d68,70000013ffe8468]
  LRU flags: hot_buffer
  ckptq: [7000000aefe93d8,700000088fe2dd8] fileq: [7000001b1174040,7000000eefc73e8]
  st: XCURRENT md: NULL rsop: 0x0 tch: 5
  flags: buffer_dirty gotten_in_current_mode block_written_once
          redo_since_read
  LRBA: [0x3e7cf.396e5.0] HSCN: [0x0bac.2f283f13] HSUB: [1] RRBA: [0x0.0.0]
  buffer tsn: 32 rdba: 0x65403498 (405/13464)
  scn: 0x0bac.2f283f13 seq: 0x01 flg: 0x02 tail: 0x3f130601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x65403498

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0040.026.00167771  0x7203e25c.0f91.02  C---    0  scn 0x0bac.2f26d569
0x02   0x008c.04b.002ae01f  0x7242e607.0089.07  C---    0  scn 0x0bac.2f26d709
0x03   0x0043.04f.00184d78  0x720020a7.e5ff.2f  C---    0  scn 0x0bac.2f28047f
.......
0x30   0x0005.041.001d254f  0x228560f5.c941.0e  --U-    1  fsc 0x0075.2f282d74
........
0x3f   0x008e.012.0029e870  0x22801e5b.0444.06  C---    0  scn 0x0bac.2f26d4de
0x40   0x0005.000.001d261c  0x228560f5.c941.0a  --U-    1  fsc 0x007b.2f282d25
.........
0x61   0x0020.004.000bc447  0x7203ce74.e107.12  C---    0  scn 0x0bac.2f26d66a
0x62   0x0049.04c.001e453a  0x2a42eb96.ffe3.06  C---    0  scn 0x0bac.2f26d647

LRBA 是recover的起点，这个是checkpoint东西，大家可以参考这里：详解oracle checkpoint
从上面可以看到，所有事务falg都是C或U，表示事务都是提交了的，说明这个错误确实没有任何影响。

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

About ora-00600 [4400] [48]

评论