暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 11g ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O故障处理记录

原创 尚雷 2023-05-13
1231

一、现象描述

近期某天下午手机不断收到zabbix告警短信,提示同城灾备机房一台Oracle 11g生产ASM日志有报错,报错信息如下:

[xxx-HG-xxx-DB-Oracle]Oracle ORA ASM Log Error(s) found on PHX-H-DB-Oracle-10.xx.xx.xxx-db1: PROBLEM (Value: >ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O。

该服务器是一套11G RAC集群,前期这套数据库两台服务器硬件过保,过保之前使用的是Centos 6操作系统,新采购的服务器部署了Centos 7.9操作系统,在系统替换前未出现同类报错信息。

登陆该数据库环境,查看日志,显示只有节点一ASM日志有此报错,报错信息如下:

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_122303.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Additional information: 128
Additional information: 1

从该日下午16点出现报错后,频繁报错,甚至达到每隔几分钟收到一条相应告警短信。

二、排查过程

查询网上及MOS资料,显示ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O和操作系统aio设置相关。
根据MOS文章 ORA-27090 - Unable to Reserve Kernel Resources for Asynchronous Disk I/O (Doc ID 579108.1)显示因为aio-max-nr设置过低导致。
CAUSE
The “aio-max-nr” kernel limit is too low.

SOLUTION
The “aio-max-nr” kernel limit should be adjusted according to Oracle recommendations which are available in this document:

查询到当前该服务器aio-nr和aio-max-nr信息如下:
[root@xxxxx ~]# cat /proc/sys/fs/aio-max-nr
1048576
[root@xxxx ~]# cat /proc/sys/fs/aio-nr
1047552

aio-nr是在io_setup系统调用上为所有当前活动的aio上下文指定的事件数的运行总数。如果aio-nr达到aio-max-nr,那么io_setup将因EAGAIN而失败。

通过查询报错日志里的trc文件初步判定和定时监控asm_diskgroup相关。

[root@xxxx ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_171522.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_171522.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name:    Linux
Node name:      xxx-xxx-xx-db1
Release:        3.10.0-1160.el7.x86_64
Version:        #1 SMP Mon Oct 19 16:18:59 UTC 2020
Machine:        x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 33
Unix process pid: 171522, image: oracle@xxx-xxx-xx-db1 (TNS V1-V3)
 
 
*** 2023-04-23 09:07:33.222
*** SESSION ID:(64.55981) 2023-04-23 09:07:33.222
*** CLIENT ID:() 2023-04-23 09:07:33.222
*** SERVICE NAME:() 2023-04-23 09:07:33.222
*** MODULE NAME:(sqlplus@xxx-xxx-xx-db1 (TNS V1-V3)) 2023-04-23 09:07:33.222
*** ACTION NAME:() 2023-04-23 09:07:33.222
  
WARNING:Could not increase the asynch I/O limit to 256 for kfdParallelIO.
 
*** 2023-04-23 09:07:33.222
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=1, mask=0x0)
----- Error Stack Dump -----
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 386140135
----- Current SQL Statement for this session (sql_id=7v21cmm3d7z26) -----
select name,state from v$asm_diskgroup
 
----- Call Stack Trace -----
calling              call     entry                argument values in hex     
location             type     point                (? means dubious value)    
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFEB42E36D0 ? 7FFEB42E37A8 ?
                                                   7FFEB42E8250 ? 000000002 ?
ksedst1()+103        call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFEB42E36D0 ? 7FFEB42E37A8 ?
                                                   7FFEB42E8250 ? 000000002 ?

image.png

三、处理过程

根据资料,修改aio-max-nr为一个较大值,调整之前推荐的大小为aio-max-nr=1048576
修改前:
[root@xxxx~]# cat /proc/sys/fs/aio-nr
1048576
可使用如下命令进行修改无需重启数据库和操作系统:
sysctl -w fs.aio-max-nr=50000000
为避免修改后对会对系统产生影响,并在测试环境提前进行了测试。
修改后:
[root@xxxx~]# cat /proc/sys/fs/aio-max-nr
50000000

因为是RAC环境,虽然第二节点没有报错,但为了保持系统参数值相同,也同样修改了二节点的aio-max-nr值。
修改后,持续观察了几天,未再收到该报错信息。

最后修改时间:2023-05-13 20:13:42
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论