某库Ora-600实例挂起分析

数据老匠 2016-07-14

571

1、问题描述

15点10分至15点26分，某实例xxdb1运行缓慢，几乎挂起。经新炬检查，引起xxdb1挂起的原因是数据库SGA出现memory corruption 导致出现 ora-600 1114 错误，以及大量会话等待内存分配所致。经讨论，决定强行中止 xxdb1 实例的运行，并将service 切换至 xxdb2 实例运行。同时，已在Metalink 上登记该问题，以进行后续的跟踪。

2、处理过程

检查xxdb1 实例alert日志，发现有异常错误。

Wed Feb 23 15:10:19 2011
Errors in file /oracle/products/admin/xxdb/udump/xxdb1_ora_3944818.trc:
ORA-00600: internal error code, arguments: [1114], [], [], [], [], [], [], []
Wed Feb 23 15:10:21 2011
Trace dumping is performing id=[cdmp_20110223151021]
Wed Feb 23 15:10:21 2011
Errors in file /oracle/products/admin/xxdb/udump/xxdb1_ora_3453106.trc:
ORA-00600: internal error code, arguments: [1114], [], [], [], [], [], [], []
………………
Wed Feb 23 15:26:52 2011
Errors in file /oracle/products/admin/xxdb/udump/xxdb1_ora_2815280.trc:
ORA-00600: internal error code, arguments: [1114], [], [], [], [], [], [], []
从 trace 文件 xxdb1_ora_3453106.trc 中可以看到以下信息：
*** 2011-02-23 15:10:21.126
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [1114], [], [], [], [], [], [], []
Current SQL statement for this session:
--------------------- Binary Stack Dump ---------------------
O/S info: user: cics, term: , ospid: 3060300, machine: S12_C_YZ_CICS_2010
program: cicsas@S12_C_YZ_CICS_2010 (TNS V1-V3)
application name: cicsas@S12_C_YZ_CICS_2010 (TNS V1-V3), hash value=2015415943
last wait for 'latch: cache buffer handles' blocking sess=0x0 seq=2172 wait_time=207324 seconds since wait started=11
address=700000010013bd0, number=7b, tries=24
Dumping Session Wait History
for 'latch: cache buffer handles' count=1 wait_time=207324
address=700000010013bd0, number=7b, tries=24
for 'latch: cache buffer handles' count=1 wait_time=292993
address=700000010013bd0, number=7b, tries=23
for 'latch: cache buffer handles' count=1 wait_time=292991
address=700000010013bd0, number=7b, tries=22
for 'latch: cache buffer handles' count=1 wait_time=292991
address=700000010013bd0, number=7b, tries=21
for 'latch: cache buffer handles' count=1 wait_time=292990
address=700000010013bd0, number=7b, tries=20
for 'latch: cache buffer handles' count=1 wait_time=292988
address=700000010013bd0, number=7b, tries=1f
for 'latch: cache buffer handles' count=1 wait_time=292989
address=700000010013bd0, number=7b, tries=1e
for 'latch: cache buffer handles' count=1 wait_time=292995
address=700000010013bd0, number=7b, tries=1d
for 'latch: cache buffer handles' count=1 wait_time=292990
address=700000010013bd0, number=7b, tries=1c
for 'latch: cache buffer handles' count=1 wait_time=292989
address=700000010013bd0, number=7b, tries=1b

其 trace 中显示的等待事件与在数据库中观察到的相同，即，在等待 cache buffer handles latch。

进一步查询 Oracle Metalink 资料后了解到，出现该问题是由于sga中发生memory freelist corruption 所致：

The exception occurs when the session freelist is corrupted. We dump the state object and raise ora-600 [1114].
This could be due to a corrupt cursor or state object.
This is an in memory (SGA) corruption.

对于这种情况，只能通过重启数据库实例来解决。

于是，在征得同意后，决定关闭数据库实例。由于无法正常停止实例，故，通过手工终止数据库后台进程 pmon 从而使得实例终止的方式终止了实例。同时，将原运行于 xxdb1 实例上的 service 切至了xxdb2 实例上运行。

在 15:39:58 手工重启了xxdb1 实例，现在xxdb1 实例正常运行。

3、后续处理

1．将于2011年2月23日晚上将原运行于xxdb1 的 service 切换回xxdb1 实例。

2．已在 metalink 上登记该问题，将由Oracle 后台工程师对该问题进行诊断及分析。我们将继续跟进该问题的分析。

oracle

文章转载自数据老匠，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

某库Ora-600实例挂起分析

评论