集群软件启动失败:CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1) To BottomTo Bottom
In this Document
Symptoms
Changes
Cause
Solution
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
All Platforms
SYMPTOMS
After starting the clusterware, CRSD & HAIP resources remain in OFFLINE status.
On verifying the output of crsctl stat res -t -init, it is noticed that the resources haip & crs are in OFFLINE status:
# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE racnode1 Started
ora.cluster_interconnect.haip >>>> OFFLINE
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE racnode1
ora.crsd
1 ONLINE OFFLINE >>>> OFFLINE
ora.cssd
1 ONLINE ONLINE racnode1
ora.cssdmonitor
1 ONLINE ONLINE racnode1
ora.ctssd
1 ONLINE ONLINE racnode1 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE racnode1
ora.evmd
1 ONLINE INTERMEDIATE racnode1
ora.gipcd
1 ONLINE ONLINE racnode1
ora.gpnpd
1 ONLINE ONLINE racnode1
ora.mdnsd
1 ONLINE ONLINE racnode1
The following messages are noticed in the clusterware alert log (<GI home>/log/<nodename>/alert<nodename>.log):
2013-01-28 15:48:39.632
[/u01/app/11.2.0.3/grid/bin/orarootagent.bin(5125)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.3/grid/log/racnode1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-01-28 15:48:39.658
[ohasd(4954)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.3/grid/log/racnode1/ohasd/ohasd.log.
2013-01-28 15:48:57.517
[crsd(9049)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/11.2.0.3/grid/log/racnode1/crsd/crsd.log.
2013-01-28 15:48:58.281
[ohasd(4954)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode1'.
CHANGES
None.
CAUSE
The clusterware failed to start the HAIP resource because the network interface for the cluster interconnect was not active.
This can be seen from the output of 'ifconfig' for the cluster interconnect interface, which shows that the interface is missing the 'UP' and the 'RUNNING' flags:
eth2 Link encap:Ethernet HWaddr 00:21:5A:9B:02:90
inet addr:10.20.xxx.yyy Bcast:10.20.xxx.255 Mask:255.255.255.0
BROADCAST MULTICAST MTU:9000 Metric:1
RX packets:1 errors:18 dropped:0 overruns:0 frame:0
TX packets:253 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:80 (80.0 b) TX bytes:33638 (32.8 KiB)
The corresponding gipcd log file (<GI home>/log/<nodename>/gipcd/gipcd.log) reports "Returning NETDATA: 0 interfaces":
2013-01-30 15:38:04.998: [ CLSINET][1101314368] Returning NETDATA: 0 interfaces ===> that is a problem
2013-01-30 15:38:04.999: [GIPCDMON][1101314368] gipcdMonitorCssCheck: found node racnode1
2013-01-30 15:38:10.000: [ CLSINET][1101314368] Returning NETDATA: 0 interfaces
2013-01-30 15:38:10.001: [GIPCDMON][1101314368] gipcdMonitorCssCheck: found node racnode1
... repeat ...
SOLUTION
Make sure the interface is has an IP address and is activated, e.g. use:
# ifconfig eth2 up
After which the interface should show:
# ifconfig eth2
eth2 Link encap:Ethernet HWaddr 08:00:27:7A:B8:D3
inet addr:10.20.xxx.yyy Bcast:10.20.xxx.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:1952 errors:0 dropped:0 overruns:0 frame:0
TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:131225 (128.1 KiB) TX bytes:17166 (16.7 KiB)
If the interface doesn't show an IP address you may need to completely shut down the interface and restart it - on Linux this is done via 'ifdown' & 'ifup', if necessary consult with the network or system administrator:
# ifdown eth2
# ifup eth2
If the interface is missing the RUNNING flag check if a cable is actually connected to the interface and that the network switch is powered on and functional.
On Linux this can be verified using the command 'ethtool', check the line 'Link detected:' which needs to say 'yes':
# ethtool eth2
Settings for eth2:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: no错误信息:
CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip
失败原因:
交换机故障导致,心跳网卡down.
参考文章:
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




