暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

ora4030问题分析

原创 王德福 2021-05-13
5332

环境:
DB:Oracle 11.2.0.4.0 两节点RAC
OS:CentOS release 6.10

问题现象:
ORA-04030: 在尝试分配 824504 字节 (pga heap,kco buffer) 时进程内存不足
ORA-04030: 在尝试分配 432 字节 (kxs-heap-c,kprbalo temp memory) 时进程内存不足

参考:https://www.cnblogs.com/feiyun8616/p/12884511.html

​ https://www.modb.pro/db/4587

1.数据的信息采集和分析

通过alert日志获取trace

1620898499201

首先,了解ORA-04030错误的原因。ORA-04030错误引起的原因大概有以下几种情况。

(1)是否有足够的可用内存?

1620898628742

通过OS监控内存发现,内存在22:00到23:00有消耗,但是剩余还是非常多的。

(2)是否设置了Oracle的限制?

SQL> show parameter pga

NAME TYPE VALUE


pga_aggregate_target big integer 8G

select * from v$pgastat;
NAME VALUE UNIT


aggregate PGA target parameter 8589934592 bytes
aggregate PGA auto target 7188784128 bytes
global memory bound 858992640 bytes
total PGA inuse 602388480 bytes
total PGA allocated 698624000 bytes
maximum PGA allocated 5113769984 bytes ##4876M
total freeable PGA memory 34799616 bytes ##33M
process count 110
max processes count 256
PGA memory freed back to OS 1099796578304 bytes
total PGA used for auto workareas 0 bytes
maximum PGA used for auto workareas 871562240 bytes
total PGA used for manual workareas 0 bytes
maximum PGA used for manual workareas 542720 bytes
over allocation count 0
bytes processed 2000503783424 bytes
extra bytes read/written 20207895552 bytes
cache hit percentage 99 percent
recompute count (total) 673948

19 rows selected.

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y WHERE x.indx = y.indx AND x.ksppinm LIKE ‘%_pga_max_size%’;

NAME VALUE DESCRIB


_pga_max_size 1717985280 Maximum size of the PGA memory for one process

1638M

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
FROM SYS.x$ksppi x, SYS.x$ksppcv y
WHERE x.indx = y.indx
AND x.ksppinm LIKE ‘%_smm_max_size%’;

NAME VALUE DESCRIB


_smm_max_size_static 838860 static maximum work area
size in auto mode (serial
)

_smm_max_size 838860 maximum work area size in
auto mode (serial)

819M

从上面的内容可以看到PGA设置的大小为8400M,根据单个会话使用PGA的期望尺寸(也可以认为是实际分配的最大尺寸)计算公式是:min(5%pga_aggregate_target,50%_pga_max_size, _smm_max_size),
可以简单计算下min(5%*8192M,50%1638M,819M)=410M(其中_pga_max_size的单位为bytes,_smm_max_size的单位为kb),即单个会话能使用PGA的期望尺寸为410M,那报错的会话是否超过了该限制呢?

查看ORA-04030报错的trace文件如下所示。

*** 2021-05-12 22:43:54.039
*** SESSION ID:(831.21689) 2021-05-12 22:43:54.039
*** CLIENT ID:() 2021-05-12 22:43:54.039
*** SERVICE NAME:(SYS$USERS) 2021-05-12 22:43:54.039
*** MODULE NAME:(DBMS_SCHEDULER) 2021-05-12 22:43:54.039
*** ACTION NAME:(ORA$AT_SQ_SQL_SW_2226) 2021-05-12 22:43:54.039
 
Dump continued from file: /data/app/oracle/diag/rdbms/***DC/***DC1/trace/***DC1_j000_108728.trc
ORA-04030: 在尝试分配 432 字节 (kxs-heap-c,kprbalo temp memory) 时进程内存不足

========= Dump for incident 32665 (ORA 4030) ========
----- Beginning of Customized Incident Dump(s) -----
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------

*** 2021-05-12 22:43:54.231
74% 3077 MB, 198779 chunks: "permanent memory          "  SQL
         kxs-heap-c      ds=0x7f693acf9090  dsprt=0xc0d1f20
25% 1030 MB, 66357 chunks: "free memory               "  
         top call heap   ds=0xc0d1f20  dsprt=(nil)
 1%   26 MB, 198772 chunks: "free memory               "  SQL
         kxs-heap-c      ds=0x7f693acf9090  dsprt=0xc0d1f20
 0% 1317 KB,  38 chunks: "free memory               "  
         pga heap        ds=0xc0cc6e0  dsprt=(nil)
 0% 1031 KB, 256 chunks: "kesaiCreateUTSBucket sesh "  
         kesutlHeapAllo  ds=0x7f693ad08f78  dsprt=0xc0d10b8
 0% 1016 KB,  34 chunks: "static frame of inst      "  PL/SQL
         koh-kghu sessi  ds=0x7f69396371b0  dsprt=0x7f693ad623e0
 0%  682 KB,  33 chunks: "permanent memory          "  
         pga heap        ds=0xc0cc6e0  dsprt=(nil)
 0%  635 KB, 444 chunks: "qmxtgCrBufClob            "  
         koh dur heap d  ds=0x7f693a350d58  dsprt=0x7f693ad623e0
 0%  629 KB,  19 chunks: "recursive addr reg file   "  PL/SQL
         koh-kghu sessi  ds=0x7f693a223d08  dsprt=0x7f693ad623e0
 0%  539 KB, 133 chunks: "kxsFrame4kPage            "  
         session heap    ds=0x7f693ad623e0  dsprt=0xc0d2140
 
=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
4151 MB total:
  3119 MB commented, 684 KB permanent
  1031 MB free (0 KB in empty extents),
    4139 MB,   1 heap:    "kxs-heap-c     "            1029 MB free held
------------------------------------------------------
Summary of subheaps at depth 1
3111 MB total:
  8277 KB commented, 3077 MB permanent
    26 MB free (0 KB in empty extents),
 
=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0x7f693ad39000]
mag=0xfefe0001 flg=0x5000003 fds=0x8 blksz=65536
blkdstbl=0x7f693ad39010, iniblk=69632 maxblk=524288 numsegs=116
In-use num=66372 siz=58982400, Freeable num=0 siz=0, Free num=1 siz=23724032
 
==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------
 
Dumping Work Area Table (level=1)
=====================================
 
  Global SGA Info
  ---------------
 
    global target:     8192 MB
    auto target:       4014 MB
    max pga:           1638 MB
    pga limit:         3276 MB
    pga limit known:  0
    pga limit errors:     0
 
    pga inuse:         3731 MB
    pga alloc:         4866 MB
    pga freeable:        35 MB
    pga freed:        1037210 MB
    pga to free:          0 %
    broker request:       0
 
    pga auto:             0 MB
    pga manual:           0 MB
 
    pga alloc  (max):  4866 MB
    pga auto   (max):   831 MB
    pga manual (max):     0 MB
 
    # workareas     :     1
    # workareas(max):    31

从以上的trace文件中可以看到,通过ACTION NAME部分可用看出来,这是JOB:AUTO SQL TUNING触发的错误。通过TOP 10 MEMORY USES FOR THIS PROCESS有MEMORY有3077MB,远远超过410MB,确实是系统跑JOB的引起的错误。下面是触发这个错误的SQL。

Begin session detail for pid 83
  sid: 831 ser: 21689 audsid: 7382691 user: 85/DBUSR***DC
    flags: (0x8010041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 83 O/S info: user: oracle, term: UNKNOWN, ospid: 108728
    image: oracle@node1.****.com (J000)
  client details:
    O/S info: user: oracle, term: UNKNOWN, ospid: 108728
    machine: node1.****.com program: oracle@node1.****.com (J000)
    application name: DBMS_SCHEDULER, hash value=2478762354
    action name: ORA$AT_SQ_SQL_SW_2226, hash value=723610744
  current SQL:
  /* SQL Analyze(831,1) */ SELECT count(*)
FROM (
	SELECT sum(decode(t1.djlb, 1, 1, 4, -1) * t1.sqkzkfd) AS sqkzkfd
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.ysje) AS ysje
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.sjfk) AS sjfk
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.zl)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.sswr_sysy)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.fk_sysy)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.hjzje)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.hjzsl)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.hjzke)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.hyzke)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.yhzke)
		, sum(decode(t1.djlb, 1, 1, 4, -1) * t1.lszke)
		, sum(t1.ljjf), sum(t1.bcjf)
		, sum(t1.num1), sum(t1.num2)
		, sum(t1.num3), sum(t1.num4)
		, sum(t1.num5), sum(t1.fqje)
		...

1620902042972

同时辅助调取报错时间点22:00-23:00之间的AWR,从这里也可用看出这个错误是由AUTO SQL TUNING触发的,这个错误的原因是遇到了单个进程最多只能使用PGA只有4G(3.2G)的限制导致,下面是trace中的相关信息。

Dumping Work Area Table (level=1)
=====================================
 
  Global SGA Info
  ---------------
 
    global target:     8192 MB
    auto target:       4014 MB
    max pga:           1638 MB
    pga limit:         3276 MB
    pga limit known:  0
    pga limit errors:     0
 
    pga inuse:         3731 MB
    pga alloc:         4866 MB
    pga freeable:        35 MB
    pga freed:        1037210 MB
    pga to free:          0 %
    broker request:       0
 
    pga auto:             0 MB
    pga manual:           0 MB
 
    pga alloc  (max):  4866 MB
    pga auto   (max):   831 MB
    pga manual (max):     0 MB
 
    # workareas     :     1
    # workareas(max):    31

可用看到,当前的PGA设置是8G,但是单个进程最多只能使用4G(3.2G?),根据MOS文档PLSQL Procedure Causing ORA-04030: (pga heap,control file i/o buffer) And ORA-04030: (koh-kghu sessi,pmuccst: adt/record) or ORA-04030: (koh-kghucall ,pmucalm coll) Errors (文档 ID 1325100.1)的说明,这是因为操作系统默认单个进程最多只能打开65530个内存映射条目限制的。

[root@node1 ~]#  cat /proc/sys/vm/max_map_count
65530

数据库也有和这个相对应的隐含参数_realfree_heap_pagesize_hint,默认是65536。

SQL> col NAME for a30
SQL> col VALUE for a20
SQL> col DESCRIB for a45
SQL> set lines 200
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE,x.ksppdesc describ FROM SYS.x$ksppi x, SYS.x$ksppcv y 
  2  WHERE x.indx = y.indx AND x.ksppinm LIKE '%realfree%';

NAME                           VALUE                DESCRIB
------------------------------ -------------------- ---------------------------------------------
_realfree_heap_max_size        32768                minimum max total heap size, in Kbytes
_realfree_heap_pagesize_hint   65536                hint for real-free page size in bytes
_realfree_heap_mode            0                    mode flags for real-free heap
_use_realfree_heap             TRUE                 use real-free based allocator for PGA memory

_realfree_heap_pagesize_hin隐含参数的意思是realfree当前的分配大小是65536 bytes,也就是64K,也就对应操作系统上每个内存映射条目的内存大小是64K,而操作系统上又限制每个进程最多能打开65530个内存映射条目,因此,每个进程使用PGA就不能超过4G。

2.解决方案

那么对应的就有两种解决方案,一种是调整操作系统单个进程打开内存映射条目的大小,另一种就是在数据库调整对应的分配单元大小。

操作系统调整单个进程打开内存映射条目大小,需要修改sysctl.conf文件,在最下面增加下面这一行即可。

# vi /etc/sysctl.conf
--在最下面增加下面这行
vm.max_map_count=262144

然后通过sysctl –p命令使之生效。这样每个映射条目大小64K,262144个条目就是16G,应该足够用了。

或者在数据库调整realfree的分配单元的大小,但是这个隐含参数是静态参数,需要重启数据库才能生效。

SQL> alter system set "_realfree_heap_pagesize_hint"=262144 scope=spfile;

然后重启数据库,使之生效。

以上两种方法,不管是修改操作系统的限制还是修改数据库的参数,只修改一个就可用,如果两种方法都修改,需要设置合理的值,避免单个进程使用的内存限制过大,万一有个进程出了问题,可能直接就把内存耗尽了。

对本案例来讲,还有一种解决方法,当然只对本案例有效,因为本案例ORA-04030错误是由于AUTO SQL TUNING导致的,而AUTO SQL TUNING对我来讲又没啥用,完全可用通过禁用AUTO SQL TUNING来解决这个问题。可用通过下马的方法关闭AUTO SQL TUNING。

BEGIN
dbms_auto_task_admin.disable(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;
/

如果需要开启AUTO SQL TUNING,可用通过下面的方法来开启。

BEGIN
dbms_auto_task_admin.enable(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;
/

针对单个进程只能使用4G的PGA导致的ORA-04030错误的问题,具体还要看是什么原因导致的,本案例特殊,由于是AUTO SQL TUNING触发的,而AUTO SQL TUNING对我这套数据库来讲并没什么用,所以可用通过禁用AUTO SQL TUNING的方式来解决,如果是业务程序的存储过程等PL/SQL导致的这个错误,就不能这样解决了,只能从上面的修改操作系统单个进程打开内存映射条目数或者修改数据库对应的每个映射条目内存分配大小来解决了。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论