问题描述
今天检查11g RAC 时发现只有节点2起来了,节点1报错CRS-0184: Cannot communicate with the CRS daemon.
查看节点2crs状态
[root@node2 ~]# /oracle/app/grid/bin/crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.DATA.dg ora....up.type ONLINE ONLINE node2 ora....ER.lsnr ora....er.type ONLINE ONLINE node2 ora....N1.lsnr ora....er.type ONLINE ONLINE node2 ora.asm ora.asm.type ONLINE ONLINE node2 ora.eons ora.eons.type ONLINE ONLINE node2 ora.gsd ora.gsd.type OFFLINE OFFLINE ora....network ora....rk.type ONLINE ONLINE node2 ora.node1.vip ora....t1.type ONLINE ONLINE node2 ora....SM2.asm application ONLINE ONLINE node2 ora....E2.lsnr application ONLINE ONLINE node2 ora.node2.gsd application OFFLINE OFFLINE ora.node2.ons application ONLINE ONLINE node2 ora.node2.vip ora....t1.type ONLINE ONLINE node2 ora.oc4j ora.oc4j.type OFFLINE OFFLINE ora.ons ora.ons.type ONLINE ONLINE node2 ora.scan1.vip ora....ip.type ONLINE ONLINE node2
专家解答
查看节点1crs状态
[root@node1 ~]# /oracle/app/grid/bin/crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
检查下ocr
[root@node1 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=8, opn=kgfolclcpi1, dep=204, loc=kgfokge
AMDU-00204: Disk N0001 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0001: 'ORCLISK1'
] [8]
查看节点1 crs日志
[root@node1 ~]# tail -100 /oracle/app/grid/log/node1/crsd/crsd.log 2013-08-16 09:54:06.336: [ OCRASM][549824240]proprasmo: Error in open/create file in dg [DATA] [ OCRASM][549824240]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operat 2013-08-16 09:54:06.365: [ OCRASM][549824240]proprasmo: kgfoCheckMount returned [7] 2013-08-16 09:54:06.365: [ OCRASM][549824240]proprasmo: The ASM instance is down 2013-08-16 09:54:06.366: [ OCRRAW][549824240]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2013-08-16 09:54:06.366: [ OCRRAW][549824240]proprioo: No OCR/OLR devices are usable 2013-08-16 09:54:06.366: [ OCRASM][549824240]proprasmcl: asmhandle is NULL 2013-08-16 09:54:06.366: [ OCRRAW][549824240]proprinit: Could not open raw device 2013-08-16 09:54:06.366: [ OCRASM][549824240]proprasmcl: asmhandle is NULL 2013-08-16 09:54:06.367: [ OCRAPI][549824240]a_init:16!: Backend init unsuccessful : [26] 2013-08-16 09:54:06.367: [ CRSOCR][549824240] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=27140, loc=kgfokge ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operat ] [7] 2013-08-16 09:54:06.367: [ CRSD][549824240][PANIC] CRSD exiting: Could not init OCR, code: 26 2013-08-16 09:54:06.367: [ CRSD][549824240] Done. [ clsdmt][1114937664]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CRSD)) 2013-08-16 09:54:07.295: [ clsdmt][1114937664]PID for the Process [4251], connkey 1 2013-08-16 09:54:07.295: [ clsdmt][1114937664]Creating PID [4251] file for home /oracle/app/grid host node1 bin crs to /oracle/app/grid/crs/init/ 2013-08-16 09:54:07.295: [ clsdmt][1114937664]Writing PID [4251] to the file [/oracle/app/grid/crs/init/node1.pid]
发现与ASM有关,就查看ASM的状态,和磁盘组的状态
[root@node1 ~]# oracleasm listdisks DISK1 DISK2 DISK3 DISK4 [root@node1 bin]# ls -l /dev/sd* brw-rw---- 1 grid asmadmin 8, 0 Aug 16 10:01 /dev/sda brw-rw---- 1 grid asmadmin 8, 1 Aug 16 10:02 /dev/sda1 brw-rw---- 1 grid asmadmin 8, 2 Aug 16 10:01 /dev/sda2 brw-rw---- 1 grid asmadmin 8, 16 Aug 16 10:01 /dev/sdb brw-rw---- 1 grid asmadmin 8, 17 Aug 16 10:02 /dev/sdb1 brw-rw---- 1 grid asmadmin 8, 32 Aug 16 10:01 /dev/sdc brw-rw---- 1 grid asmadmin 8, 33 Aug 16 10:02 /dev/sdc1 brw-rw---- 1 grid asmadmin 8, 48 Aug 16 10:01 /dev/sdd brw-rw---- 1 grid asmadmin 8, 49 Aug 16 10:02 /dev/sdd1 brw-rw---- 1 grid asmadmin 8, 64 Aug 16 10:01 /dev/sde brw-rw---- 1 grid asmadmin 8, 65 Aug 16 10:02 /dev/sde1 [root@node1 bin]# ps -ef |grep smon grid 4018 1 0 10:31 ? 00:00:00 asm_smon_+ASM1 root 4156 3682 0 10:31 pts/1 00:00:00 grep smon [grid@node1 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.1.0 Production on Fri Aug 16 11:37:57 2013 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> select name,state,total_mb,free_mb from v$asm_diskgroup; NAME STATE TOTAL_MB FREE_MB ------------------------------ ----------- ---------- ---------- DATA MOUNTED 8188 7786 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options
都是正常的。
后来根据crs的log,在mos上查说是bin目录下的oracle权限不对。
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operat
1、grid/bin/orale 权限和所有者,所属组如下:如下:
-rwsr-s--x grid oinstall
2、oracle/db1/bin/oracle 权限和所有者,所属组如下:
-rwsr-s--x oracle asmadmin
检查自己的grid下的oracle如下
[root@node1 ~]# ls -l /oracle/app/grid/bin/oracle
-rwxr-x--x 1 grid asmadmin 184286251 Aug 14 14:43 /oracle/app/grid/bin/oracle
权限不对,所属组也不对。
[root@node1 ~]# chmod 6751 /oracle/app/grid/bin/oracle
[root@node1 ~]# chown gridinstall /oracle/app/grid/bin/oracle
查看oracle db下的oracle如下:
[root@node1 bin]# ls -l /oracle/app/oracle/db1/bin/oracle
-rwsr-s--x 1 oracle oinstall 173515880 Aug 14 17:09 /oracle/app/oracle/db1/bin/oracle
所属组不对
[root@node1 bin]# chown oracle:asmadmin /oracle/app/oracle/db1/bin/oracle
修改后如下:
[root@node1 bin]# ls -l /oracle/app/oracle/db1/bin/oracle
-rwsr-s--x 1 oracle asmadmin 173515880 Aug 14 17:09 /oracle/app/oracle/db1/bin/oracle
[root@node1 bin]# ls -l /oracle/app/grid/bin/oracle
-rwsr-s--x 1 grid oinstall 184286251 Aug 14 14:43 /oracle/app/grid/bin/oracle
重启节点1,之后OK
[root@node1 bin]# ./crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.DATA.dg ora....up.type ONLINE ONLINE node1 ora....ER.lsnr ora....er.type ONLINE ONLINE node1 ora....N1.lsnr ora....er.type ONLINE ONLINE node2 ora.asm ora.asm.type ONLINE ONLINE node1 ora.eons ora.eons.type ONLINE ONLINE node1 ora.gsd ora.gsd.type OFFLINE OFFLINE ora....network ora....rk.type ONLINE ONLINE node1 ora....SM1.asm application ONLINE ONLINE node1 ora....E1.lsnr application ONLINE ONLINE node1 ora.node1.gsd application OFFLINE OFFLINE ora.node1.ons application ONLINE ONLINE node1 ora.node1.vip ora....t1.type ONLINE ONLINE node1 ora....SM2.asm application ONLINE ONLINE node2 ora....E2.lsnr application ONLINE ONLINE node2 ora.node2.gsd application OFFLINE OFFLINE ora.node2.ons application ONLINE ONLINE node2 ora.node2.vip ora....t1.type ONLINE ONLINE node2 ora.oc4j ora.oc4j.type OFFLINE OFFLINE ora.ons ora.ons.type ONLINE ONLINE node1 ora.scan1.vip ora....ip.type ONLINE ONLINE node2