ora04301导致数据库无法连接 - 墨天轮问答

分享

2019-03-21

ora04301导致数据库无法连接

数据库出现客户端无法连接的情况，查看数据库日志出现 ORA-04031：

Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_ora_5399.trc  (incident=193547):
ORA-04031: unable to allocate 3000 bytes of shared memory ("shared pool","unknown object","sga heap(2,0)","call")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Mar 21 16:10:57 2019
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_ora_5409.trc  (incident=193548):
ORA-04031: unable to allocate 3000 bytes of shared memory ("shared pool","unknown object","sga heap(2,0)","call")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Mar 21 16:11:03 2019
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_ora_5413.trc  (incident=193549):
ORA-04031: unable to allocate 3000 bytes of shared memory ("shared pool","unknown object","sga heap(2,0)","call")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Mar 21 16:11:13 2019
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_j000_5420.trc:
ORA-04031: unable to allocate 3000 bytes of shared memory ("shared pool","unknown object","sga heap(2,0)","call")
Thu Mar 21 16:11:14 2019
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_cjq0_53290.trc:
Thu Mar 21 16:11:45 2019
Process m000 died, see its trace file
Thu Mar 21 16:12:22 2019
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_j000_5553.trc:
ORA-04031: unable to allocate 3000 bytes of shared memory ("shared pool","unknown object","sga heap(2,0)","call")
Thu Mar 21 16:12:22 2019
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_cjq0_53290.trc:
Thu Mar 21 16:12:53 2019
Errors in file /u01/oracle/diag/rdbms/noap/noap/trace/noap_j001_4974.trc:
ORA-12012: ??ˉ?А?? 228 3RA-04031: Ξ·¨·??000 ????2?ě′栨"shared pool","unknown object","sga heap(2,0)","call")
ORA-06512: ? "NOAP.DS_DESKTOP_DATA_N", line 1412
ORA-04031: Ξ·¨·??000 ????2?ě′栨"shared pool","unknown object","sga heap(2,0)","call

查看sharepool情况

SELECT free_space,
       avg_free_size,
       used_space,
       avg_used_size,
       request_failures,
       last_failure_size
  FROM v$shared_pool_reserved;


FREE_SPACE AVG_FREE_SIZE USED_SPACE AVG_USED_SIZE REQUEST_FAILURES LAST_FAILURE_SIZE
---------- ------------- ---------- ------------- ---------------- -----------------
 588533240    3181260.76    2056760    11117.6216            99516              4160

查看share pool参数

SQL> show parameter shared_pool

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
shared_pool_reserved_size            big integer 1280M
shared_pool_size                     big integer 25G

子池情况
SELECT   subpool, NAME, SUM (BYTES), ROUND (SUM (BYTES) / 1048576, 2) mb 
    FROM (SELECT    'shared pool (' || DECODE (TO_CHAR (ksmdsidx), '0', '0 - Unused', ksmdsidx) 
                 || '):' subpool, ksmssnam NAME, ksmsslen BYTES 
            FROM x$ksmss WHERE ksmsslen > 0 AND LOWER (ksmssnam) LIKE LOWER ('%free memory%')) 
GROUP BY subpool, NAME ORDER BY subpool ASC, SUM (BYTES) DESC;

SUBPOOL                                                 NAME                       SUM(BYTES)         MB
------------------------------------------------------- -------------------------- ---------- ----------
shared pool (1):                                        free memory                 939602640     896.07
shared pool (2):                                        free memory                 717533504     684.29
shared pool (3):                                        free memory                 720730272     687.34
shared pool (4):                                        free memory                 735436552     701.37
shared pool (5):                                        free memory                 895821304     854.32
shared pool (6):                                        free memory                 844425648     805.31
shared pool (7):                                        free memory                 771154504     735.43

查看解析高的sql，有很多分区表lock，怀疑是分区的添加和删除操作

select substr(a.SQL_TEXT, 1, 50), count(*) 
  from v$sql a 
 where last_load_time like '2019-03-21%'  and first_load_time like '2019-03-21%' 
 group by substr(a.SQL_TEXT, 1, 50) 
 having count(*)>10 
 order by count(*);

SUBSTR(A.SQL_TEXT,1,50)                                                                                COUNT(*)
---------------------------------------------------------------------------------------------------- ----------
LOCK TABLE "TBL_RESULT_1526726750_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1542455300_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726710_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726806_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726871_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726724_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726694_3" PARTITION ("P                                                           11
LOCK TABLE "TBL_RESULT_1526726748_3" PARTITION ("P                                                           11

感觉sharedpool应该还够，但是还是报了ora04301，感觉是分区相关的bug

收藏

分享

10条回答

默认

最新

查下硬解析的历史趋势：

select *
  from (select pre_snap_id,
               snap_id,
               to_char(end_interval_time,'yyyymmdd hh24:mi:ss'),
               round((value - pre_value) /
                     (EXTRACT(DAY FROM
                              b.END_INTERVAL_TIME - b.begin_INTERVAL_TIME) *
                     86400 + EXTRACT(HOUR FROM b.END_INTERVAL_TIME -
                                      b.begin_INTERVAL_TIME) * 3600 +
                     EXTRACT(MINUTE FROM
                              b.END_INTERVAL_TIME - b.begin_INTERVAL_TIME) * 60 +
                     EXTRACT(SECOND FROM
                              b.END_INTERVAL_TIME - b.begin_INTERVAL_TIME)),
                     2) hard_parse
          from (select a.snap_id,
                       begin_INTERVAL_TIME,
                       end_interval_time,
                       lag(a.snap_id) over(order by a.snap_id) pre_snap_id,
                       value,
                       lag(value) over(order by a.snap_id) pre_value
                  from DBA_HIST_SYSSTAT a, dba_hist_snapshot b
                 where stat_name = 'parse count (hard)'
                   and a.dbid = b.dbid
                   and a.snap_id = b.snap_id
                   and a.instance_number = b.instance_number
                      --and a.dbid=280689037
                   and a.instance_number = 1) b
         where pre_snap_id is not null
         order by 1)

另外上传下最近的AWR和alert中提到的trc文件（一个即可）

评论

有用 0

上传附件：noap_j000_59418_i173772.trc

评论

有用 0

AWR分析结果: 1553160492037_aomen20181021.html

评论

有用 0

现场远程环境现在连不上，看看之前的trc和awr是否有用处

评论

有用 0

有人在做删除分区的操作，如果业务需要用到这张表，肯定会有问题。

========= Dump for incident 173772 (ORA 4031) ========

*** 2018-10-15 01:03:33.472
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=7gggrkymugq5n) -----
alter table DS_MR_GROUP_ERIC_CB_L_1 drop partition PART_2018101200
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x2cd1f09188       524  package body LTE.PKG_AUTO_MAN_TAB_PART_V2
0x2cd1f09188       579  package body LTE.PKG_AUTO_MAN_TAB_PART_V2
0x2c6245e370         1  anonymous block

评论

有用 0

看了下AWR 几个问题：

1.首先出问题的是sga heap(2,0) 是2 subpool的0 duration 你看的2 子池有684M 不代表 0 还有

2.你shared pool 中的kglsim heap kglsim object batch 是有问题的占用了2G左右

3.导致你问题的是 SGA: allocation forcing component growth 而该等待意思就是在动态的去调整内存

4.看你trace 有drop 分区，那么有可能就是经常性的drop 分区导致了 kglsim heap kglsim object batch 异常增长可以通过

AWR 趋势去查询这两个指标增长情况

建议：

观察是否是分区操作导致的内存异常增长如果是那么调整操作

或者

直接关闭 ASMM 或者设置 _enabled_shared_pool_duration=fase

评论

有用 0

AWR分析结果: 1553221109603_aomen2019032115.html

评论

有用 0

上传了一个故障前最新的awr报告

评论

有用 0

感谢，我们这边自己分析也是因为分区的操作导致这个异常，准备先把asmm关闭，手动分配内存看看。

评论

有用 0

问题已关闭: 问题已经得到解决

评论

有用 0

回答交流

提交

问题信息

请登录之后查看

附件列表

请登录之后查看

邀请回答

暂无人订阅该标签，敬请期待~~

墨值悬赏