暂无图片
分享
Coolkid
2019-03-19
ASM磁盘中有一块盘报错无权限,但查看后各磁盘权限一致且只在一个节点报错

alert报错信息如下:

Errors in file /u01/oracle/diag/rdbms/site/site1/trace/site1_ora_12013.trc:

ORA-15025: could not open disk "/dev/mapper/asmdata15"

ORA-27041: unable to open file

Linux-x86_64 Error: 13: Permission denied

Additional information: 3

ORA-00604: error occurred at recursive SQL level 2

ORA-01115: IO error reading block from file  (block # )

ORA-01110: data file 1: '+DATADG/site/datafile/system.268.960656447'

ORA-15081: failed to submit an I/O operation to a disk

双节点RAC,只有其中一个节点的日志中报该错误,频率不高,几天才出现一次,每次出现该报错最少连续三四次,asmdata15磁盘和其他磁盘的权限一致,每次都是报该盘(asmdata15)有问题,权限截图见附件图片,目前对生产无影响,感谢!

asm磁盘权限1.png

用户权限.png


收藏
分享
12条回答
默认
最新
Coolkid
上传附件:site1_ora_12013.trc
暂无图片 评论
暂无图片 有用 0
Moone

对2个节点做下集群检测并上传结果

./runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose

暂无图片 评论
暂无图片 有用 0
Coolkid
上传附件:site1_runcluvfy
暂无图片 评论
暂无图片 有用 0
Coolkid

节点1结果见附件,节点2也跑了一次,和节点1一样。。。

暂无图片 评论
暂无图片 有用 0
Moone

检查结果看site2的DNS解析超时报错:

site2de 

Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined

File "/etc/resolv.conf" does not have both domain and search entries defined

Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...

domain entry in file "/etc/resolv.conf" is consistent across nodes

Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...

search entry in file "/etc/resolv.conf" is consistent across nodes

Checking DNS response time for an unreachable node

  Node Name                             Status                  

  ------------------------------------  ------------------------

  site2                             failed                  

  site1                             passed                  

PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: site2


File "/etc/resolv.conf" is not consistent across nodes


建议:

1、确认2个节点的/rec/resolv.conf配置是否一致

2、使用nslookup分别在2个节点解析site1/2

3、如何确认无误,在/etc/resolv.conf增加如下配置:

options timeout:1

options attempts:2


暂无图片 评论
暂无图片 有用 0
Coolkid

DNS已配置,另外,我们的rac没有采用DNS的方式,而是使用的hosts方式。请问除了DNS问题外,大概还会有什么原因会导致这种报错呀

Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined

File "/etc/resolv.conf" does not have both domain and search entries defined

Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...

domain entry in file "/etc/resolv.conf" is consistent across nodes

Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...

search entry in file "/etc/resolv.conf" is consistent across nodes

Checking DNS response time for an unreachable node

  Node Name                             Status                  

  ------------------------------------  ------------------------

  site2                             passed                  

  site1                             passed                  

The DNS response time for an unreachable node is within acceptable limit on all nodes


File "/etc/resolv.conf" is consistent across nodes


暂无图片 评论
暂无图片 有用 0
Kamus

这个问题还挺有趣的。

目前的故障情况再确认一下:

site1实例上没过几天会出现一下报错,每次报错都是相同的ASM数据文件无法打开,site2实例上一切正常。


那么请问,每过几天出现一次报错的时间都是不一样的,毫无规律的吗?


暂无图片 评论
暂无图片 有用 0
Coolkid

是的,确实是没有规律,以下是从zone告警邮件中抽取了一些报错信息及时间点:

2019-01-14 16:47:59,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-01-16 13:13:20,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-01-16 13:18:09,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-01-16 13:50:53,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-01-16 14:06:33,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-03-02 17:43:55,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-03-02 17:50:05,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-03-19 14:03:30,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"

2019-03-25 14:16:35,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"


暂无图片 评论
暂无图片 有用 0
李华

kfk_debug_get_user_groups: uid:2, euid:1001, gid:0, egid:1021

把用户组信息也发下,另外还有oracle文件的权限,看样子是你们什么程序连上去查报出来的。

暂无图片 评论
暂无图片 有用 0
Coolkid

你好,用户组信息是一样的,检查oracle文件的时候发现了一点区别:

节点一的权限是:-rwsr-s--x. 1 oracle asmadmin 239626689 Nov 21  2017 oracle

节点二的权限是:-rwsr-s--x  1 oracle asmadmin 239626689 Nov 21  2017 oracle

对比其他集群发现正常的都是-rwsr-s--x

暂无图片 评论
暂无图片 有用 0
李华

kfk_debug_get_user_groups: uid:2, euid:1001, gid:0, egid:1021

你这个信息很奇怪的,uid 是2 这个用户应该是daemon 我觉得问题是出在这个上面,你那个查询的程序是启动在哪里的?怎么连接的数据库。也看下监听什么用户启动的。

暂无图片 评论
暂无图片 有用 0
章芋文
问题已关闭: 问题已经得到解决
暂无图片 评论
暂无图片 有用 0
回答交流
提交
问题信息
请登录之后查看
附件列表
请登录之后查看
邀请回答
暂无人订阅该标签,敬请期待~~
暂无图片墨值悬赏