暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

频发:故障排除之又见 ORA-4031丨云和恩墨技术通讯

数据和云 2019-11-27
1703


亲爱的读者朋友:


为了及时共享行业案例,通知共性问题,达成共享和提前预防,我们整理和编辑了《云和恩墨技术通讯》,通过对过去一段时间的知识回顾,故障归纳,以期提供有价值的信息供大家参考。同时,我们也希望能够将热点事件、新的产品特性及其他有价值的信息聚集起来,为您提供具有前瞻性的支持信息,保持对于当前最新的数据库新闻和事件的了解,其中包括重要数据库产品发布、警报、更新、新版本、补丁等。


本期目录:


  • 新闻:2019年11月数据库流行度排行

  • 经验:Oracle RAC跨节点访问数据块,节点长事务加剧gc等待

  • 经验:IBM MQ通道连接数达到最大故障分析

  • 问题:子游标过多导致数据库HANG

  • 频发:再谈Library Cache Lock

  • 频发:故障排除之又见ORA-4031

  • 警示:强制关闭OGG进程触发bug致abended

  • 公告:首届墨天轮年度十大突出贡献人物评选活动


云和恩墨技术通讯集锦:https://www.modb.pro/doc/topic/5927(复制链接 浏览器中打开或者点击文章左下方的“阅读原文”,即可查看)


部分精选-频发:故障排除之又见 ORA-4031  作者:候静远



当遇到ORA-4031错误时,你会不会内心一紧。Oracle进程在向SGA申请内存时,如果申请失败,则会抛出这个错误,大部分情况下是在向SGA中的 shared pool申请内存时失败。严重情况下,可能导致数据库出现异常崩溃。本文分享客户近期碰到的一起由于ORA-4031问题导致数据库异常宕机的案例,供大家参考。


问题描述


2019年9月4日凌晨3点左右,接到监控系统告警:数据库出现异常,无法连接。登陆到数据库1节点查看后台alert日志发现有大量ORA-04031报错,2节点有少量报错。为了尽快恢复业务,尝试直接重启1节点数据库,重启完成之后恢复正常。


问题分析


1. 节点后台对应alert日志:


    Wed Sep 04 03:57:50 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(2,0)","kglsim object batch")
    Wed Sep 04 03:58:10 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(1,0)","kglsim object batch")
    Wed Sep 04 03:58:26 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(7,0)","kglsim object batch")
    Wed Sep 04 03:58:42 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(6,0)","kglsim object batch")
    Wed Sep 04 03:58:57 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")
    Wed Sep 04 03:59:08 2019
    Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")
    Wed Sep 04 03:59:10 2019
    License high water mark = 97
    USER (ospid: 28750): terminating the instance


    统计1节点每个子池及duration出现04031的次数,sga heap(n,0)-n代表第几个子池,0代表是第几个duration:



    根据alert日志可以看出,所有的ora-4031都发生在shared pool子池的第0个duration上。


      Summary of resize operations history:
      shared pool start 3.19 GB now 3.19 GB 0 grows 0 shrinks
      large pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks
      java pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks
      SGA Target start 32.00 GB now 32.00 GB 0 grows 0 shrinks
      DEFAULT buffer cache start 27.59 GB now 27.59 GB 0 grows 0 shrinks
      PGA Target start 11.00 GB now 11.00 GB 0 grows 0 shrinks


      发现shared pool并没有进行resize。


        ==============================================
        TOP 20 MEMORY USES ACROSS SGA HEAP 1 - 7
        ----------------------------------------------
        "KGLH0 " 1103 MB 19%
        "SQLA " 1081 MB 18%
        "free memory " 835 MB 14%
        "gcs resources " 794 MB 14%
        "gcs shadows " 550 MB 9%
        "db_block_hash_buckets " 178 MB 3%
        "ASH buffers " 160 MB 3%
        "KGLHD " 157 MB 3%
        "Checkpoint queue " 156 MB 3%
        "kglsim object batch " 90 MB 2%
        "kglsim heap " 56 MB 1%
        "ges resource " 53 MB 1%
        "ges enqueues " 43 MB 1%
        "KGLDA " 41 MB 1%
        "dbwriter coalesce buffer " 40 MB 1%
        "dirty object counts array " 40 MB 1%
        "object queue " 35 MB 1%
        "gcs res hash bucket " 32 MB 1%
        "dbktb: trace buffer " 31 MB 1%
        "FileOpenBlock " 30 MB 1%
        TOTALS ---------------------------------------
        Total free memory 830 MB
        Total memory alloc. 5026 MB
        Grand total 5856 MB
        ==============================================


        2. 节点后台对应的alert日志:


          Wed Sep 04 03:23:18 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")
          Wed Sep 04 03:23:23 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")
          Wed Sep 04 03:23:29 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")
          Wed Sep 04 03:23:34 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")
          Wed Sep 04 03:23:39 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")
          Wed Sep 04 03:23:45 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")
          Wed Sep 04 03:23:50 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")
          Wed Sep 04 03:23:55 2019
          Emon ping encountered error 12801
          Errors in file u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:
          ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
          ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")


          统计2节点每个子池及duration出现04031的次数:



          根据alert日志可以看出,所有的ora-4031同样都发生在shared pool子池的第0个duration上,导致4031的根本原因是因为shared pool子池的第0个duration内存不足。


          通过设置sga_target的ASMM管理后,共享池(shared_pool)和流池(streams pool)每个子池都是4个duration。它们分别是:instance,session,cursor,execution,只有第四个duration,也就是execution是可以resize的,而当第0个duration内存不足的时候不能resize,就直接会报错ora-4031。


          问题解决


          通过禁用duration,必须设置参数"_enable_shared_pool_durations=fales",并重启数据库。

          alter system set "_enable_shared_pool_durations"=false scope=spfile;

          通过该参数设置后,把它们四个duration都合并到一个池中,不会再出现一个duration的内存被耗尽,而另外一个duration仍具有空闲内存,对于共享池和流池都是这样;设置sga_target之后,所有池都会通过buffer cache来传输granules(颗粒)整数倍大小的内存,如果shrink,则返回buffer cache,没有从一个pool到另外一个pool的直接传输,所有的内存resize都会以buffer cache作为源和目标。

          设置该参数的唯一负面影响是SGA resize的时候,不能从shared pool中取内存到其他的pool。


          DBASK,DBA的即时问答平台


          扩展阅读



          1. 《云和恩墨技术通讯》(9月刊)

          2. 《云和恩墨技术通讯》(8月刊)

          3. 《云和恩墨技术通讯》(7月刊)

          4. 《云和恩墨技术通讯》(6月刊)

          5. 《云和恩墨技术通讯》(5月刊)

          6. 《云和恩墨技术通讯》(4月刊)

          7. 《云和恩墨技术通讯》(3月刊)


          数据和云
          ID:OraNews
          如有收获,请划至底部,点击“在看”,谢谢!

          公司简介  | 招聘 | DTCC | 数据技术嘉年华 | 免费课程 | 入驻华为严选商城

            

          zCloud | SQM | Bethune X zData一体机 | MyData一体机 | ZDBM 备份一体机

          Oracle技术架构 | 免费课程 | 数据库排行榜 | DBASK问题集萃 | 技术通讯 

          升级迁移 | 性能优化 | 智能整合 安全保障 |  架构设计 | SQL审核 | 分布式架构 | 高可用容灾 | 运维代维

          云和恩墨大讲堂 | 一个分享交流的地方

          长按,识别二维码,加入万人交流社群


          请备注:云和恩墨大讲堂

          最后修改时间:2019-11-28 09:42:53
          文章转载自数据和云,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

          评论