请求恩墨工程师支援，某医院客户his服务器昨天晚上cpu 突然100%宕机，之前也出现过这情况!

pgone

2020-02-04

oracle

现只提取了alert.log

10条回答

默认

最新

李真旭

能否提供相关日志！awr+alert log。压缩一下。

zhenxu.li@enmotech.com

有用 0

杨廷琨

如果是CPU100%的问题，一般来说仅有alert日志是不够的，在出现问题时刻之前最后一个awr报告有可能会有帮助，此外如果要有操作系统方面的监控更容易辅助判断问题，不过Windows环境部署额外监控的概率较低。

可以先上传一下alert信息，我们看一下。

有用 0

pgone

上传附件：alert_orcl.rar

有用 0

weizhao.zhang （anbob）

1，检查操作系统日志
2，是否存在定时job任务
3，是按电源关机重启了吗? 有没有看当时CPU 占用进程？

有用 0

外包DBA

看到有死锁和Errors in file d:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_6644.trc:
ORA-00600: internal error code, arguments: [723], [39304], [39304], [memory leak], [], [], [], []
ORA-03135: connection lost contact
初步估计是死锁造成…具体还要分析trace文件
不过windows+10g还真是个神奇的组合…

有用 0

杨廷琨

数据库层面报了一些连接错误，应该是和当时CPU 100%有关，但是此外Oracle并没有任何核心进程的报错，也没有数据库关闭的信息，对于非RAC环境而言，看起来更像是操作系统发生了重启导致的。
确认一下系统在故障时间点是否发生了重启。

有用 0

weizhao.zhang （anbob）

128g Memory 20 CPUs
2020/02/04 00:08 左右出现问题

取dash信息

create table dash0204 tablespace users 
as select /*+parallel(t 16)*/* from dba_hist_active_sess_history  t
where sample_time between  
to_date('2020-2-2 00:00','yyyy-mm-dd hh24:mi') and to_date('2020-2-4 01:00','yyyy-mm-dd hh24:mi');

break on etime skip 1

 select * from (
    select etime,nvl(event,'on cpu') events,,program, dbtime, cnt,first_time,end_time,
	round(100*ratio_to_report(dbtime) OVER (partition by etime ),2) pct,row_number() over(partition by etime order by dbtime  desc) rn
 from (
select substr(to_char(SAMPLE_TIME,'yyyymmdd hh24:mi'),1,13)||'0' etime,event,program,count(*)*10 dbtime,count(*) cnt,
to_char(min(SAMPLE_TIME),'hh24:mi:ss') first_time,to_char(max(SAMPLE_TIME),'hh24:mi:ss') end_time
 from dash0204
--where sample_time between to_date('2015-4-1 16:00','yyyy-mm-dd hh24:mi') and to_date('2015-4-1 17:00','yyyy-mm-dd hh24:mi')
 where  --INSTANCE_NUMBER=2
 group by substr(to_char(SAMPLE_TIME,'yyyymmdd hh24:mi'),1,13),event,program
)
) where rn<=10;

有用 0