频发：故障排除之又见 ORA-4031丨云和恩墨技术通讯

数据和云 2019-11-27

2034

亲爱的读者朋友：

为了及时共享行业案例，通知共性问题，达成共享和提前预防，我们整理和编辑了《云和恩墨技术通讯》，通过对过去一段时间的知识回顾，故障归纳，以期提供有价值的信息供大家参考。同时，我们也希望能够将热点事件、新的产品特性及其他有价值的信息聚集起来，为您提供具有前瞻性的支持信息，保持对于当前最新的数据库新闻和事件的了解，其中包括重要数据库产品发布、警报、更新、新版本、补丁等。

本期目录：

新闻：2019年11月数据库流行度排行
经验：Oracle RAC跨节点访问数据块，节点长事务加剧gc等待
经验：IBM MQ通道连接数达到最大故障分析
问题：子游标过多导致数据库HANG
频发：再谈Library Cache Lock
频发：故障排除之又见ORA-4031
警示：强制关闭OGG进程触发bug致abended
公告：首届墨天轮年度十大突出贡献人物评选活动

云和恩墨技术通讯集锦：https://www.modb.pro/doc/topic/5927（复制链接浏览器中打开或者点击文章左下方的“阅读原文”，即可查看）

部分精选-频发：故障排除之又见 ORA-4031 作者：候静远

当遇到ORA-4031错误时，你会不会内心一紧。Oracle进程在向SGA申请内存时，如果申请失败，则会抛出这个错误，大部分情况下是在向SGA中的 shared pool申请内存时失败。严重情况下，可能导致数据库出现异常崩溃。本文分享客户近期碰到的一起由于ORA-4031问题导致数据库异常宕机的案例，供大家参考。

问题描述

2019年9月4日凌晨3点左右，接到监控系统告警：数据库出现异常，无法连接。登陆到数据库1节点查看后台alert日志发现有大量ORA-04031报错，2节点有少量报错。为了尽快恢复业务，尝试直接重启1节点数据库，重启完成之后恢复正常。

问题分析

1. 节点后台对应alert日志：

Wed Sep 04 03:57:50 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(2,0)","kglsim object batch")
Wed Sep 04 03:58:10 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(1,0)","kglsim object batch")
Wed Sep 04 03:58:26 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(7,0)","kglsim object batch")
Wed Sep 04 03:58:42 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(6,0)","kglsim object batch")
Wed Sep 04 03:58:57 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")
Wed Sep 04 03:59:08 2019
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")
Wed Sep 04 03:59:10 2019
License high water mark = 97
USER (ospid: 28750): terminating the instance

统计1节点每个子池及duration出现04031的次数，sga heap(n,0)-n代表第几个子池，0代表是第几个duration：

根据alert日志可以看出，所有的ora-4031都发生在shared pool子池的第0个duration上。

Summary of resize operations history:
shared pool            start   3.19 GB  now   3.19 GB  0 grows   0 shrinks
large pool             start   0.50 GB  now   0.50 GB  0 grows   0 shrinks
java pool              start   0.50 GB  now   0.50 GB  0 grows   0 shrinks
SGA Target             start  32.00 GB  now  32.00 GB  0 grows   0 shrinks
DEFAULT buffer cache   start  27.59 GB  now  27.59 GB  0 grows   0 shrinks
PGA Target             start  11.00 GB  now  11.00 GB  0 grows   0 shrinks

发现shared pool并没有进行resize。

==============================================
TOP 20 MEMORY USES ACROSS SGA HEAP 1 - 7
----------------------------------------------
"KGLH0                     "       1103 MB 19%
"SQLA                      "       1081 MB 18%
"free memory               "        835 MB 14%
"gcs resources             "        794 MB 14%
"gcs shadows               "        550 MB  9%
"db_block_hash_buckets     "        178 MB  3%
"ASH buffers               "        160 MB  3%
"KGLHD                     "        157 MB  3%
"Checkpoint queue          "        156 MB  3%
"kglsim object batch       "         90 MB  2%
"kglsim heap               "         56 MB  1%
"ges resource              "         53 MB  1%
"ges enqueues              "         43 MB  1%
"KGLDA                     "         41 MB  1%
"dbwriter coalesce buffer  "         40 MB  1%
"dirty object counts array "         40 MB  1%
"object queue              "         35 MB  1%
"gcs res hash bucket       "         32 MB  1%
"dbktb: trace buffer       "         31 MB  1%
"FileOpenBlock             "         30 MB  1%
TOTALS ---------------------------------------
Total free memory                   830 MB
Total memory alloc.                5026 MB
Grand total                        5856 MB
==============================================

2. 节点后台对应的alert日志：

Wed Sep 04 03:23:18 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")
Wed Sep 04 03:23:23 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")
Wed Sep 04 03:23:29 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")
Wed Sep 04 03:23:34 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")
Wed Sep 04 03:23:39 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")
Wed Sep 04 03:23:45 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")
Wed Sep 04 03:23:50 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")
Wed Sep 04 03:23:55 2019
Emon ping encountered error 12801
Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")

统计2节点每个子池及duration出现04031的次数：

根据alert日志可以看出，所有的ora-4031同样都发生在shared pool子池的第0个duration上，导致4031的根本原因是因为shared pool子池的第0个duration内存不足。

通过设置sga_target的ASMM管理后，共享池(shared_pool)和流池(streams pool)每个子池都是4个duration。它们分别是：instance，session，cursor，execution，只有第四个duration，也就是execution是可以resize的，而当第0个duration内存不足的时候不能resize，就直接会报错ora-4031。

问题解决

通过禁用duration，必须设置参数"_enable_shared_pool_durations=fales"，并重启数据库。

alter system set "_enable_shared_pool_durations"=false scope=spfile;

通过该参数设置后，把它们四个duration都合并到一个池中，不会再出现一个duration的内存被耗尽，而另外一个duration仍具有空闲内存，对于共享池和流池都是这样；设置sga_target之后，所有池都会通过buffer cache来传输granules（颗粒）整数倍大小的内存，如果shrink，则返回buffer cache，没有从一个pool到另外一个pool的直接传输，所有的内存resize都会以buffer cache作为源和目标。

设置该参数的唯一负面影响是SGA resize的时候，不能从shared pool中取内存到其他的pool。

DBASK，DBA的即时问答平台

扩展阅读