问题描述
基本信息,11.2.0.4的3节点rac,aix 6.1操作系统。
数据库2节点出现600错误后,数据库实例宕掉,其余2个节点正常,集群在问题期间也未发生重启,本次影响的仅仅是2节点实例的重启。
Fri Apr 05 17:28:38 2019
Errors in file /u01/app/oracle/diag/rdbms/db102/db1022/trace/db1022_lmon_3474368.trc (incident=160089):
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/db102/db1022/incident/incdir_160089/db1022_lmon_3474368_i160089.trc
Fri Apr 05 17:28:42 2019
Dumping diagnostic data in directory=[cdmp_20190405172842], requested by (instance=2, osid=3474368 (LMON)), summary=[incident=160089].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/db102/db1022/trace/db1022_lmon_3474368.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], []
LMON (ospid: 3474368): terminating the instance due to error 481
System state dump requested by (instance=2, osid=3474368 (LMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/db102/db1022/trace/db1022_diag_5637018_20190405172846.trc
Instance terminated by LMON, pid = 3474368
分析时也寻找到了俩篇mos
LMON or LMS Process Crashes Instance With ORA-600 [kghstack_underflow_internal_2] (文档 ID 2003278.1)
Bug 18687067 : ORA-600 [KGHSTACK_UNDERFLOW_INTERNAL_2]
老盖也发过一篇blog说是110204 aix平台下一个专有的bug,Bug的原因是 AIX C编译代码产生的BUG,他的那篇博客中的callstack跟上面提到的mos一致,但跟我遇到的情况也不一样。 结合问题节点采集的数据,callstack对应不上。
请问是否能够分析出本次故障的原因。
专家解答
通过b1022_lmon_3474368_i160089.trc里的call stack分析,跟
Bug 18687067 : ORA-600 [KGHSTACK_UNDERFLOW_INTERNAL_2]也表述一致,这其实就是老盖blog提到的 AIX C编译代码产生的BUG的问题,目前客户环境未打任何补丁,已建议客户安装最新PSU