1.错误描述
不能获取的oracle 11g rac 集群信息,数据库服务异常。
2.查看集群
step 1.数据库集群检查
节点1
[grid@xyjj-01 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@xyjj-01 ~]$
CRS-4535:无法与集群就绪服务通信
step 2.数据库状态检查
[grid@xyjj-01 ~]$ srvctl config database
PRCR-1119 : Failed to look up CRS resources of database type
PRCR-1115 : Failed to find entities of type resource that match filters (TYPE == ora.database.type) and contain attributes DB_UNIQUE_NAME,ORACLE_HOME,VERSION
Cannot communicate with crsd
[grid@xyjj-01 ~]$
[grid@xyjj-01 ~]$ srvctl config listener
PRCR-1035 : Failed to look up CRS resource for ora.listener.type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
[grid@xyjj-01 ~]$
数据库集群节点1无法打开,且重启无效
3.分析原因
通过以上过程分析,1节点CRS无法启动的原因为2节点gipc进程异常,进而导致两个节点间无法正常建立连接,而gipc为RAC的重要组成部分,从11gR2(11.2.0.2)开始,oracle决定由集群自己来管理私网网卡,集群新特性gipc(Grid IPC)被介绍,这个新特性以守护进程gipcd.bin的形式存在于集群中,主要的功能如下:
- 当集群启动时,发现集群的私网网卡,集群私网的信息是从gpnp profile中获得的。并对发现的私网接口进行检查;
- 利用之前发现的私网网卡,发现集群中的其他节点,并和其他节点的私网网卡建立联系;
- 如果集群配置了多块私网网卡,当某个节点的某一个/几个私网网卡出现问题时,离线有问题的私网,并通知其他节点;
确认gipcd.bin的作用后,其实1节点CRS无法启动的原因就已经找到,集群私网的连接是通过该进程实现,但2节点gipc进程处于异常状态,所以1节点经过多次重启始终无法加入集群。
4.解决方案
从以上分析确定为2节点gipc进程异常导致的1节点CRS无法正常启动,虽然gipc是私网连接使用,但它本身的重启不会导致集群异常,所以通过手动 “kill -9 gipcd.bin” 进程,随后gipc进程会自动启动,而1节点也随之启动成功;或重启集群服务尝试重新启动GPICD进程。
重启 crs 集群
crsctl stop crs -f
crsctl start crs
查看启动的日志信息:
2025-02-08 09:48:39.774:
[client(129321)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/xyjj-01/client/crsctl_oracle.log.
2025-02-08 09:48:53.559:
[/u01/app/11.2.0/grid/bin/oraagent.bin(7036)]CRS-5818:Aborted command 'stop' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:0:34974} in /u01/app/11.2.0/grid/log/xyjj-01/agent/ohasd/oraagent_grid/oraagent_grid.log.
2025-02-08 09:48:55.560:
[ohasd(5970)]CRS-2757:Command 'Stop' timed out waiting for response from the resource 'ora.asm'. Details at (:CRSPE00111:) {0:0:34974} in /u01/app/11.2.0/grid/log/xyjj-01/ohasd/ohasd.log.
2025-02-08 09:49:01.391:
[cssd(7126)]CRS-1603:CSSD on node xyjj-01 shutdown by user.
2025-02-08 09:49:03.014:
[ohasd(5970)]CRS-2767:Resource state recovery not attempted for 'ora.cssdmonitor' as its target state is OFFLINE
2025-02-08 09:49:03.014:
[ohasd(5970)]CRS-2769:Unable to failover resource 'ora.cssdmonitor'.
2025-02-08 09:49:03.122:
[cssd(7126)]CRS-1660:The CSS daemon shutdown has completed
2025-02-08 09:49:05.824:
[gpnpd(7057)]CRS-2329:GPNPD on node xyjj-01 shutdown.
2025-02-08 09:49:07.829:
[/u01/app/11.2.0/grid/bin/oraagent.bin(7036)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:14} in /u01/app/11.2.0/grid/log/xyjj-01/agent/ohasd/oraagent_grid/oraagent_grid.log.
2025-02-08 09:49:13.459:
[/u01/app/11.2.0/grid/bin/cssdagent(129522)]CRS-5823:Could not initialize agent framework. Details at (:CRSAGF00120:) in /u01/app/11.2.0/grid/log/xyjj-01/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
根据启动的日志信息,分析为 GIPC error [29] msg [gipcretConnectionRefused] 导致ASM 初始化失败
[root@xyjj-01 ~]# cat /u01/app/11.2.0/grid/log/xyjj-01/client/crsctl_oracle.log
Oracle Database 11g Clusterware Release 11.2.0.4.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
2025-02-08 00:18:30.689: [ OCRMSG][3303528256]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2025-02-08 00:18:30.689: [ OCRMSG][3303528256]GIPC error [29] msg [gipcretConnectionRefused]
2025-02-08 00:18:30.689: [ OCRMSG][3303528256]prom_connect: error while waiting for connection complete [24]
2025-02-08 00:18:30.723: [ OCRASM][3303528256]proprasmo: kgfoCheckMount return [6]. Cannot proceed with dirty open.
2025-02-08 00:18:30.723: [ OCRASM][3303528256]proprasmo: Error in open/create file in dg [OCR]
[ OCRASM][3303528256]SLOS : SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
2025-02-08 09:43:39.465: [ OCRASM][2584614720]ASM Error Stack :
2025-02-08 09:43:39.478: [ OCRASM][2584614720]proprasmo: kgfoCheckMount returned [6]
2025-02-08 09:43:39.478: [ OCRASM][2584614720]proprasmo: The ASM disk group OCR is not found or not mounted
2025-02-08 09:43:39.479: [ OCRRAW][2584614720]proprioo: Failed to open [+OCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2025-02-08 09:43:39.479: [ OCRRAW][2584614720]proprioo: No OCR/OLR devices are usable
2025-02-08 09:43:39.479: [ OCRASM][2584614720]proprasmcl: asmhandle is NULL
2025-02-08 09:43:39.479: [ OCRRAW][2584614720]proprinit: Could not open raw device
2025-02-08 09:43:39.479: [ OCRASM][2584614720]proprasmcl: asmhandle is NULL
2025-02-08 09:43:39.479: [ default][2584614720]a_init:7!: Backend init unsuccessful : [26]
2025-02-08 09:43:39.634: [ OCRMSG][1410344768]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2025-02-08 09:43:39.634: [ OCRMSG][1410344768]GIPC error [29] msg [gipcretConnectionRefused]
2025-02-08 09:43:39.634: [ OCRMSG][1410344768]prom_connect: error while waiting for connection complete [24]
2025-02-08 09:43:39.666: [ OCRASM][1410344768]proprasmo: kgfoCheckMount return [6]. Cannot proceed with dirty open.
2025-02-08 09:43:39.666: [ OCRASM][1410344768]proprasmo: Error in open/create file in dg [OCR]
[ OCRASM][1410344768]SLOS : SLOS: cat=6, opn=kgfo, dep=0, loc=kgfoCkMt03
查看pmon进程,此处也表明ASM实例没有启动
[root@xyjj-01 bin]# ps -ef | grep pmon
root 7447 7184 0 10:48 pts/2 00:00:00 grep pmon
从上面的分析可知,应该是ASM实例没有启动的原因导致了crsd进程无法启动。经过长时间的启动,集群服务正常运行。
[root@xyjj-01 ~]# /u01/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE xyjj-01
ora.FRA.dg
ONLINE ONLINE xyjj-01
ora.LISTENER.lsnr
ONLINE ONLINE xyjj-01
ora.OCR.dg
ONLINE ONLINE xyjj-01
ora.asm
ONLINE ONLINE xyjj-01 Started
ora.gsd
OFFLINE OFFLINE xyjj-01
ora.net1.network
ONLINE ONLINE xyjj-01
ora.ons
ONLINE ONLINE xyjj-01
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE xyjj-01
ora.cvu
1 ONLINE ONLINE xyjj-01
ora.oc4j
1 ONLINE ONLINE xyjj-01
ora.scan1.vip
1 ONLINE ONLINE xyjj-01
ora.xyjj-01.vip
1 ONLINE ONLINE xyjj-01
ora.xyjj-02.vip
1 ONLINE OFFLINE
ora.xyjj.db
1 ONLINE ONLINE xyjj-01 Open
2 ONLINE OFFLINE
[root@xyjj-01 ~]#
参考:
OHASD Failed to Start: Inappropriate ioctl for device (Doc ID 1069182.1)
Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)




