alert报错信息如下:
Errors in file /u01/oracle/diag/rdbms/site/site1/trace/site1_ora_12013.trc:
ORA-15025: could not open disk "/dev/mapper/asmdata15"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 1: '+DATADG/site/datafile/system.268.960656447'
ORA-15081: failed to submit an I/O operation to a disk
双节点RAC,只有其中一个节点的日志中报该错误,频率不高,几天才出现一次,每次出现该报错最少连续三四次,asmdata15磁盘和其他磁盘的权限一致,每次都是报该盘(asmdata15)有问题,权限截图见附件图片,目前对生产无影响,感谢!


对2个节点做下集群检测并上传结果
./runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose
评论
有用 0节点1结果见附件,节点2也跑了一次,和节点1一样。。。
评论
有用 0检查结果看site2的DNS解析超时报错:
site2de
Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined
File "/etc/resolv.conf" does not have both domain and search entries defined
Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...
domain entry in file "/etc/resolv.conf" is consistent across nodes
Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...
search entry in file "/etc/resolv.conf" is consistent across nodes
Checking DNS response time for an unreachable node
Node Name Status
------------------------------------ ------------------------
site2 failed
site1 passed
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: site2
File "/etc/resolv.conf" is not consistent across nodes
建议:
1、确认2个节点的/rec/resolv.conf配置是否一致
2、使用nslookup分别在2个节点解析site1/2
3、如何确认无误,在/etc/resolv.conf增加如下配置:
options timeout:1
options attempts:2
评论
有用 0DNS已配置,另外,我们的rac没有采用DNS的方式,而是使用的hosts方式。请问除了DNS问题外,大概还会有什么原因会导致这种报错呀
Checking the file "/etc/resolv.conf" to make sure only one of domain and search entries is defined
File "/etc/resolv.conf" does not have both domain and search entries defined
Checking if domain entry in file "/etc/resolv.conf" is consistent across the nodes...
domain entry in file "/etc/resolv.conf" is consistent across nodes
Checking if search entry in file "/etc/resolv.conf" is consistent across the nodes...
search entry in file "/etc/resolv.conf" is consistent across nodes
Checking DNS response time for an unreachable node
Node Name Status
------------------------------------ ------------------------
site2 passed
site1 passed
The DNS response time for an unreachable node is within acceptable limit on all nodes
File "/etc/resolv.conf" is consistent across nodes
评论
有用 0这个问题还挺有趣的。
目前的故障情况再确认一下:
site1实例上没过几天会出现一下报错,每次报错都是相同的ASM数据文件无法打开,site2实例上一切正常。
那么请问,每过几天出现一次报错的时间都是不一样的,毫无规律的吗?
评论
有用 0是的,确实是没有规律,以下是从zone告警邮件中抽取了一些报错信息及时间点:
2019-01-14 16:47:59,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-01-16 13:13:20,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-01-16 13:18:09,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-01-16 13:50:53,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-01-16 14:06:33,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-03-02 17:43:55,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-03-02 17:50:05,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-03-19 14:03:30,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
2019-03-25 14:16:35,数据库后台报错: ORA-15025: could not open disk "/dev/mapper/asmdata15"
评论
有用 0kfk_debug_get_user_groups: uid:2, euid:1001, gid:0, egid:1021
把用户组信息也发下,另外还有oracle文件的权限,看样子是你们什么程序连上去查报出来的。
评论
有用 0你好,用户组信息是一样的,检查oracle文件的时候发现了一点区别:
节点一的权限是:-rwsr-s--x. 1 oracle asmadmin 239626689 Nov 21 2017 oracle
节点二的权限是:-rwsr-s--x 1 oracle asmadmin 239626689 Nov 21 2017 oracle
对比其他集群发现正常的都是-rwsr-s--x
评论
有用 0kfk_debug_get_user_groups: uid:2, euid:1001, gid:0, egid:1021
你这个信息很奇怪的,uid 是2 这个用户应该是daemon 我觉得问题是出在这个上面,你那个查询的程序是启动在哪里的?怎么连接的数据库。也看下监听什么用户启动的。
评论
有用 0
墨值悬赏

