CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip

原创许玉冲 2022-11-18
3440
集群软件启动失败：CRSD & HAIP Resources Remain In OFFLINE as Private Network Interface is Partially Up (Doc ID 1529721.1) To BottomTo Bottom

In this Document
Symptoms
Changes
Cause
Solution

APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
All Platforms
SYMPTOMS
After starting the clusterware, CRSD & HAIP resources remain in OFFLINE status.

On verifying the output  of crsctl stat res -t -init, it is noticed that the resources haip & crs are in OFFLINE status:

# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
     1        ONLINE  ONLINE       racnode1                  Started                     
ora.cluster_interconnect.haip                                                      >>>>  OFFLINE
     1        ONLINE  OFFLINE
ora.crf
     1        ONLINE  ONLINE       racnode1
ora.crsd
     1        ONLINE  OFFLINE                                                      >>>>  OFFLINE
ora.cssd
     1        ONLINE  ONLINE       racnode1
ora.cssdmonitor
     1        ONLINE  ONLINE       racnode1
ora.ctssd
     1        ONLINE  ONLINE       racnode1                  OBSERVER
ora.diskmon
     1        OFFLINE OFFLINE
ora.drivers.acfs
     1        ONLINE  ONLINE       racnode1
ora.evmd
     1        ONLINE  INTERMEDIATE racnode1
ora.gipcd
     1        ONLINE  ONLINE       racnode1
ora.gpnpd
     1        ONLINE  ONLINE       racnode1
ora.mdnsd
     1        ONLINE  ONLINE       racnode1
 

 The following messages are noticed in the clusterware alert log (<GI home>/log/<nodename>/alert<nodename>.log):

2013-01-28 15:48:39.632
[/u01/app/11.2.0.3/grid/bin/orarootagent.bin(5125)]CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.3/grid/log/racnode1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-01-28 15:48:39.658
[ohasd(4954)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.cluster_interconnect.haip'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.3/grid/log/racnode1/ohasd/ohasd.log.
2013-01-28 15:48:57.517
[crsd(9049)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-44: Error in network address and interface operations Network address and interface operations error [7]]. Details at (:CRSD00111:) in /u01/app/11.2.0.3/grid/log/racnode1/crsd/crsd.log.
2013-01-28 15:48:58.281
[ohasd(4954)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode1'.
 

CHANGES
 None.

CAUSE
The clusterware failed to start the HAIP resource because the network interface for the cluster interconnect was not active.

This can be seen from the output of 'ifconfig' for the cluster interconnect interface, which shows that the interface is missing the 'UP' and the  'RUNNING' flags:

eth2      Link encap:Ethernet  HWaddr 00:21:5A:9B:02:90
        inet addr:10.20.xxx.yyy  Bcast:10.20.xxx.255  Mask:255.255.255.0
        BROADCAST MULTICAST  MTU:9000  Metric:1
        RX packets:1 errors:18 dropped:0 overruns:0 frame:0    
        TX packets:253 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:80 (80.0 b)  TX bytes:33638 (32.8 KiB)
 

The corresponding gipcd log file (<GI home>/log/<nodename>/gipcd/gipcd.log) reports "Returning NETDATA: 0 interfaces":

2013-01-30 15:38:04.998: [ CLSINET][1101314368] Returning NETDATA: 0 interfaces              ===>  that is a problem
2013-01-30 15:38:04.999: [GIPCDMON][1101314368] gipcdMonitorCssCheck: found node racnode1
2013-01-30 15:38:10.000: [ CLSINET][1101314368] Returning NETDATA: 0 interfaces
2013-01-30 15:38:10.001: [GIPCDMON][1101314368] gipcdMonitorCssCheck: found node racnode1
... repeat ...

  

SOLUTION
Make sure the interface is has an IP address and is activated, e.g. use:

# ifconfig eth2 up
After which the interface should show:

# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 08:00:27:7A:B8:D3  
          inet addr:10.20.xxx.yyy  Bcast:10.20.xxx.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:1952 errors:0 dropped:0 overruns:0 frame:0
          TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:131225 (128.1 KiB)  TX bytes:17166 (16.7 KiB)
 

If the interface doesn't show an IP address you may need to completely shut down the interface and restart it - on Linux this is done via 'ifdown' & 'ifup', if necessary consult with the network or system administrator:

# ifdown eth2
# ifup eth2
 If the interface is missing the RUNNING flag check if a cable is actually connected to the interface and that the network switch is powered on and functional.

On Linux this can be verified using the command 'ethtool', check the line 'Link detected:' which needs to say 'yes':

# ethtool eth2
Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: no
错误信息：
CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip
失败原因：
交换机故障导致，心跳网卡down.
参考文章：
oracle
「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」
关注作者
CRS-5818:Aborted command 'start' for resource 'ora.cluster_interconnect.haip

评论