Oracle 19c RAC - CRS 8503 [bug]

DBA管家 2022-02-08

3936

Oracle官方提供免费的Oracle19c, db以及grid的下载，但是基版是19.3，而对于19.3的有许多bug，其中RAC中会经常遇到CRS8503的bug,目前此bug在RU11中进行了修复，此bug的现象为: 双节点的RAC中只能允许其中一个节点正常启动，另外一个节点会停止且从crs日志中可以看到CRS8503的错误,因此各位小伙伴在搭建19cRAC时确保安装最新的RU避免类似问题.

偶遇bug CRS-8503
表像

最近在搭建一套19c RAC，所有介质都是用的官方提供19.3的版本, 按照<Oracle19c搭建准备工作>搭建完GRID之后，诡异的事是,节点2会自动关闭并且无法再重新启动crs


> 此时两节点crs资源组状态
[root@node01 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       node01                   Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node01                   STABLE
ora.crf
      1        ONLINE  ONLINE       node01                   STABLE
ora.crsd
      1        ONLINE  ONLINE       node01                   STABLE
ora.cssd
      1        ONLINE  ONLINE       node01                   STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       node01                   STABLE
ora.ctssd
      1        ONLINE  ONLINE       node01                   OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.driver.afd
      1        ONLINE  ONLINE       node01                   STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       node01                   STABLE
ora.evmd
      1        ONLINE  ONLINE       node01                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       node01                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       node01                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       node01                   STABLE
ora.storage
      1        ONLINE  ONLINE       node01                   STABLE
--------------------------------------------------------------------------------


[root@node02 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               STABLE
ora.crf
      1        ONLINE  ONLINE       node02                   STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  OFFLINE                               STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       node02                   STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.driver.afd
      1        ONLINE  ONLINE       node02                   STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       node02                   STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE node02                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       node02                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       node02                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       node02                   STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

[root@node01 ~]# crsctl check cluster -all
**************************************************************
node01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@node02 ~]# crsctl check cluster -all
**************************************************************
node02:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************

然后通过TFA查看日志

12c之后，tfa会默认安装


[root@node01 ~]# tfactl analyze -since 1d
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes...  Please wait...
INFO: analyzing host: node01

                      Report title: Analysis of Alert,System Logs
                 Report date range: last ~1 day(s)
        Report (default) time zone: CST - China Standard Time
               Analysis started at: 23-Jan-2022 07:21:16 PM CST
             Elapsed analysis time: 7 second(s).
                Configuration file: u01/app/19.3.0/grid/tfa/node01/tfa_home/ext/tnt/conf/tnt.prop
               Configuration group: all
               Total message count:         34,155, from 23-Jan-2021 07:21:19 PM CST to 23-Jan-2022 07:21:14 PM CST
  Messages matching last ~1 day(s):         33,756, from 22-Jan-2022 07:36:53 PM CST to 23-Jan-2022 07:21:14 PM CST
        last ~1 day(s) error count:              5, from 22-Jan-2022 07:46:30 PM CST to 23-Jan-2022 06:29:17 PM CST
last ~1 day(s) ignored error count:              0
 last ~1 day(s) unique error count:              5

Message types for last ~1 day(s)
   Occurrences percent  server name          type
   ----------- -------  -------------------- -----
        33,733   99.9%  node01               generic
            18    0.1%  node01               WARNING
             5    0.0%  node01               ERROR
   ----------- -------
        33,756  100.0%

Unique error messages for last ~1 day(s)
   Occurrences percent  server name          error
   ----------- -------  -------------------- -----
             1   20.0%  node01               [OCSSD(3342)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 node02 .
             1   20.0%  node01               [OCSSD(28200)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc
             1   20.0%  node01               [OCSSD(20206)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 .
             1   20.0%  node01               [OCSSD(887)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc
             1   20.0%  node01               [OCSSD(24769)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 .
   ----------- -------
             5  100.0%


INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes...  Please wait...
INFO: analyzing host: node02

                      Report title: Analysis of Alert,System Logs
                 Report date range: last ~1 day(s)
        Report (default) time zone: CST - China Standard Time
               Analysis started at: 23-Jan-2022 07:21:26 PM CST
             Elapsed analysis time: 3 second(s).
                Configuration file: /u01/app/19.3.0/grid/tfa/node02/tfa_home/ext/tnt/conf/tnt.prop
               Configuration group: all
               Total message count:         21,591, from 10-Jan-2022 08:40:45 PM CST to 23-Jan-2022 07:20:01 PM CST
  Messages matching last ~1 day(s):          4,713, from 22-Jan-2022 07:36:52 PM CST to 23-Jan-2022 07:20:01 PM CST
        last ~1 day(s) error count:              5, from 22-Jan-2022 07:53:09 PM CST to 23-Jan-2022 06:41:51 PM CST
last ~1 day(s) ignored error count:              0
 last ~1 day(s) unique error count:              4

Message types for last ~1 day(s)
   Occurrences percent  server name          type
   ----------- -------  -------------------- -----
         4,659   98.9%  node02               generic
            49    1.0%  node02               WARNING
             5    0.1%  node02               ERROR
   ----------- -------
         4,713  100.0%

Unique error messages for last ~1 day(s)
   Occurrences percent  server name          error
   ----------- -------  -------------------- -----
             2   40.0%  node02               [OCSSD(3282)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node02 .
             1   20.0%  node02               [OCSSD(3282)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 node02 .
             1   20.0%  node02               [OCSSD(2005)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node02 .
             1   20.0%  node02               [OCSSD(20807)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
   ----------- -------
             5  100.0%

> 根据tfa结果分别查看对应的ocss日志
[root@node01 bin]# tail -50f  /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc

2022-01-23 19:24:23.842 :    CSSD:3113711360: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:23.843 :    CSSD:3126068992: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:23.843 :    CSSD:3110557440: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:24.342 :    CSSD:2670159616: [     INFO] clssnmSendingThread: sending status msg to all nodes
2022-01-23 19:24:24.342 :    CSSD:2670159616: [     INFO] clssnmSendingThread: sent 5 status msgs to all nodes
2022-01-23 19:24:24.918 :    CSSD:2667005696: [     INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 19:24:25.940 :    CSSD:3113711360: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:25.940 :    CSSD:3113711360: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:25.940 :    CSSD:3126068992: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:25.941 :    CSSD:3110557440: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:27.991 :    CSSD:3113711360: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:27.991 :    CSSD:3113711360: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:27.991 :    CSSD:3126068992: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:27.991 :    CSSD:3110557440: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:29.346 :    CSSD:2670159616: [     INFO] clssnmSendingThread: sending status msg to all nodes
2022-01-23 19:24:29.346 :    CSSD:2670159616: [     INFO] clssnmSendingThread: sent 5 status msgs to all nodes
2022-01-23 19:24:29.919 :    CSSD:2667005696: [     INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 19:24:30.044 :    CSSD:3113711360: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:30.044 :    CSSD:3113711360: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:30.044 :    CSSD:3126068992: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:30.045 :    CSSD:3110557440: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:32.148 :    CSSD:3113711360: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:32.148 :    CSSD:3113711360: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:32.148 :    CSSD:3126068992: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:32.148 :    CSSD:3110557440: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
^C
<节点1的ocss日志一直在按照上面的输出重复刷新着。。。>

#节点2的ocss日志

#tail -50f  /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
> 节点2的日志终止于最后的错误。
2022-01-23 18:41:54.121 :    CSSD:4244395776: [     INFO] clssscthrdmain: Terminating thread GM Peer Lsnr
2022-01-23 18:41:54.148 :    CSSD:4242818816: [     INFO] clssgmpSendMsgToGMC: Cannot send it to a hub node
2022-01-23 18:41:54.148 :    CSSD:4242818816: [     INFO] clssgmpcGMPShutdownComplete: GMP has finished its shutdown
2022-01-23 18:41:54.149 :    CSSD:3716949760: [     INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 18:41:54.255 :    CSSD:3734296320: [     INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47697, LATS 21991244, lastSeqNo 47694, uniqueness 1642933741, timestamp 1642934511/21965144
2022-01-23 18:41:54.728 :    CSSD:3720103680: [  WARNING] clssnmSendingThread: state(1) clusterState(0) exit
2022-01-23 18:41:54.728 :    CSSD:3720103680: [     INFO] clssscthrdmain: Terminating thread clssnmSendingThread
2022-01-23 18:41:55.258 :    CSSD:3734296320: [     INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47700, LATS 21992254, lastSeqNo 47697, uniqueness 1642933741, timestamp 1642934512/21966144
2022-01-23 18:41:55.263 :    CSSD:3729565440: [     INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47701, LATS 21992254, lastSeqNo 47653, uniqueness 1642933741, timestamp 1642934512/21966154
2022-01-23 18:41:55.556 :GIPCXCPT:4252276480:  gipcWaitF [EvmConWait : evmgipcio.c : 303]: EXCEPTION[ ret (uknown) (910) ]  failed to wait on obj 0x7fadd82ba580 [0000000000005d7c] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=7ce66a8b-4c5a7358-20807))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=4c5a7358-7ce66a8b-20598))', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 20598, readyRef (nil), ready 0, wobj 0x7fadd82ba530, sendp 0x7fadd82bce40 status 0flags 0xa100a716, flags-2 0x100, usrFlags 0x30020 }, reqList 0x7fadfd73a138, nreq 1, creq 0x7fadfd73a130 timeout 30000 ms, flags 0x240
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsGNSEvtHandler: clsce evt res CONN (3)
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsCheckGNSConfigured: CLSCE wait(30000)  returned 0, clskerror: clsce: CRS-10203: (:CLSCE0063:)  Could not connect to the Event Manager daemon, evtres 3
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsCheckGNSConfigured: CLSCE connection error, re-subscribing for GNS resource events.
2022-01-23 18:41:55.558 : CLSCEVT:4252276480: (:CLSCE0028:)clsce_unsubscribe 0x7fadd8139870 successfully unsubscribed : 0
2022-01-23 18:41:56.261 :    CSSD:3734296320: [     INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47703, LATS 21993254, lastSeqNo 47700, uniqueness 1642933741, timestamp 1642934513/21967164
Trace file /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.3.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved.
DDE: Flood control is not active
    CLSB:4268476160: [     INFO] Oracle Clusterware infrastructure error in OCSSD (OS PID 20807): Fatal signal 6 has occurred in program ocssd thread 4268476160; nested signal count is 1
2022-01-23T18:41:56.960547+08:00
Incident 1 created, dump file: /u01/app/grid/diag/crs/node02/crs/incident/incdir_1/ocssd_i1.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []

--->
#node02#
tail -50f /u01/app/grid/diag/crs/node02/crs/incident/incdir_1/ocssd_i1.trc
查看trace
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: Checking configured voting files
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: number of voting files 3
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: Checkin for AFD:CRS1 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS1)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: Checkin for AFD:CRS2 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS2)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: Checkin for AFD:CRS3 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS3)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmCountVfInSite: vf number = 0
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmCountVfInSite: vf number = 1
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmCountVfInSite: vf number = 2
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmCountVfInSite: In site = 00112233-44556677-8899aabb-ccddeeff, vf count = 3,
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmSiteMajority: number of sites currently available 1 has fallen to the minimum no of sites required 1
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmvDiskPMT: sleeping for 1000 ms
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [     INFO] clssnmWaitThread: thrd(9), timeout(1000), wakeonpost(0)
2022-01-23 18:41:57.226 :CLSDIMT:4268476160: Wraps: [13] Size: [10005,129]
2022-01-23 18:41:57.226 :CLSDIMT:4268476160: ===> CLSD In-memory buffer ends
----- END DDE Action: 'clsdAdrActions' (SUCCESS, 24 csec) -----
[TOC00018-END]
----- END DDE Actions Dump (total 25 csec) -----
[TOC00004-END]
End of Incident Dump
[TOC00002-END]
TOC00000 - Table of contents
TOC00001 - Error Stack
TOC00002 - Dump for incident 1 (CRS 8503) <<<< -- 又是CRS 8503
| TOC00003 - START Event Driven Actions Dump
| TOC00004 - START DDE Actions Dump
| | TOC00005 - START DDE Action: 'dumpFrameContext' (Sync)
| | | TOC00006 - START Frame Context DUMP
| | TOC00007 - START DDE Action: 'dumpDiagCtx' (Sync)
| | | TOC00008 - Diag Context Dump
| | TOC00009 - START DDE Action: 'dumpBuckets' (Sync)
| | | TOC00010 - Trace Bucket Dump Begin: CLSD_SHARED_BUCKET
| | TOC00011 - START DDE Action: 'dumpGeneralConfiguration' (Sync)
| | | TOC00012 - General Configuration
| | TOC00013 - START DDE Action: 'xdb_dump_buckets' (Sync)
| | TOC00014 - START DDE Action: 'dumpKGERing' (Sync)
| | TOC00015 - START DDE Action: 'dumpKGEIEParms' (Sync)
| | TOC00016 - START DDE Action: 'dumpKGEState' (Sync)
| | TOC00017 - START DDE Action: 'kpuActionDefault' (Sync)
| | TOC00018 - START DDE Action: 'clsdAdrActions' (Sync)
End of TOC

随机在两台机器上check对应磁盘,并未发现任何异常


   SQL> SQL> SQL> SQL> SQL> SQL>
GROUP_NUMBER STATE      REDUNDANCY              TOTAL_MB    FREE_MB NAME            FAILGROUP            HEADER_STATU
------------ ---------- --------------------- ---------- ---------- --------------- -------------------- ------------
           0 NORMAL     UNKNOWN                        0          0                                      MEMBER
           0 NORMAL     UNKNOWN                        0          0                                      MEMBER
           0 NORMAL     UNKNOWN                        0          0                                      MEMBER
           0 NORMAL     UNKNOWN                        0          0                                      MEMBER
           3 NORMAL     UNKNOWN                    15360      15308 FRA_0000        FRA_0000             MEMBER
           1 NORMAL     UNKNOWN                     2048       1748 CRS2            CRS2                 MEMBER
           1 NORMAL     UNKNOWN                     2048       1740 CRS1            CRS1                 MEMBER
           1 NORMAL     UNKNOWN                     2048       1740 CRS3            CRS3                 MEMBER
           2 NORMAL     UNKNOWN                    10240      10152 DATA1           DATA1                MEMBER

9 rows selected.

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'YYYY-MM-DDHH24:MI:SS')
---------------------------------------------------------
2022-01-23 19:32:17


SQL> select instance_name from v$instance;

INSTANCE_NAME
------------------------------------------------
+ASM1

初步还猜测,既如此,两节点只能存活一个,为此测试

# 关闭节点crs
[root@node02 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node02'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node02'
CRS-2673: Attempting to stop 'ora.evmd' on 'node02'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node02'
CRS-2673: Attempting to stop 'ora.crf' on 'node02'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'node02'
CRS-2673: Attempting to stop 'ora.driver.afd' on 'node02'
CRS-2677: Stop of 'ora.evmd' on 'node02' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node02' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'node02' succeeded
CRS-2677: Stop of 'ora.driver.afd' on 'node02' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node02'
CRS-2677: Stop of 'ora.mdnsd' on 'node02' succeeded
CRS-2677: Stop of 'ora.crf' on 'node02' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node02'
CRS-2677: Stop of 'ora.gpnpd' on 'node02' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'node02' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node02' has completed
CRS-4133: Oracle High Availability Services has been stopped.

#关闭节点1

[root@node01 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node01'
CRS-2673: Attempting to stop 'ora.crsd' on 'node01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'node01'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'node01'
CRS-2673: Attempting to stop 'ora.cvu' on 'node01'
CRS-2673: Attempting to stop 'ora.node02.vip' on 'node01'
CRS-2673: Attempting to stop 'ora.chad' on 'node01'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'node01'
CRS-2673: Attempting to stop 'ora.qosmserver' on 'node01'
CRS-33673: Attempting to stop resource group 'ora.asmgroup' on server 'node01'
CRS-2673: Attempting to stop 'ora.CRS.dg' on 'node01'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'node01'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'node01'
CRS-2677: Stop of 'ora.DATA.dg' on 'node01' succeeded
CRS-2677: Stop of 'ora.CRS.dg' on 'node01' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'node01'
CRS-2677: Stop of 'ora.node02.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'node01'
CRS-2677: Stop of 'ora.cvu' on 'node01' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.node01.vip' on 'node01'
CRS-2677: Stop of 'ora.asm' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'node01'
CRS-2677: Stop of 'ora.scan1.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.node01.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.chad' on 'node01' succeeded
CRS-2677: Stop of 'ora.qosmserver' on 'node01' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asmnet1.asmnetwork' on 'node01'
CRS-2677: Stop of 'ora.asmnet1.asmnetwork' on 'node01' succeeded
CRS-33677: Stop of resource group 'ora.asmgroup' on server 'node01' succeeded.
CRS-2673: Attempting to stop 'ora.ons' on 'node01'
CRS-2677: Stop of 'ora.ons' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'node01'
CRS-2677: Stop of 'ora.net1.network' on 'node01' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node01' has completed
CRS-2677: Stop of 'ora.crsd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'node01'
CRS-2673: Attempting to stop 'ora.crf' on 'node01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node01'
CRS-2677: Stop of 'ora.storage' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'node01'
CRS-2677: Stop of 'ora.mdnsd' on 'node01' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node01' succeeded
CRS-2677: Stop of 'ora.crf' on 'node01' succeeded
CRS-2677: Stop of 'ora.asm' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'node01'
CRS-2673: Attempting to stop 'ora.evmd' on 'node01'
CRS-2677: Stop of 'ora.evmd' on 'node01' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node01'
CRS-2677: Stop of 'ora.cssd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.driver.afd' on 'node01'
CRS-2673: Attempting to stop 'ora.gipcd' on 'node01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node01'
CRS-2677: Stop of 'ora.driver.afd' on 'node01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node01' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'node01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node01' has completed
CRS-4133: Oracle High Availability Services has been stopped.

# 先重试启动节点2的crs

[root@node02 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@node02 ~]# ps -ef | grep -i asm
grid     30682     1 17 19:36 ?        00:00:00 oracle+ASM2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     30750 17059  0 19:36 pts/0    00:00:00 grep --color=auto -i asm
[root@node02 ~]# ps -ef | grep -i asm
grid     30827     1  0 19:36 ?        00:00:00 asm_pmon_+ASM2
grid     30829     1  0 19:36 ?        00:00:00 asm_clmn_+ASM2
grid     30831     1  0 19:36 ?        00:00:00 asm_psp0_+ASM2
grid     30834     1  3 19:36 ?        00:00:00 asm_vktm_+ASM2
grid     30838     1  0 19:36 ?        00:00:00 asm_gen0_+ASM2
grid     30840     1  0 19:36 ?        00:00:00 asm_mman_+ASM2
grid     30844     1  0 19:36 ?        00:00:00 asm_gen1_+ASM2
grid     30847     1  0 19:36 ?        00:00:00 asm_diag_+ASM2
grid     30849     1  0 19:36 ?        00:00:00 asm_ping_+ASM2
grid     30851     1  0 19:36 ?        00:00:00 asm_pman_+ASM2
grid     30853     1  0 19:36 ?        00:00:00 asm_dia0_+ASM2
grid     30855     1  1 19:36 ?        00:00:00 asm_lmon_+ASM2
grid     30857     1  0 19:36 ?        00:00:00 asm_lmd0_+ASM2
grid     30859     1  1 19:36 ?        00:00:00 asm_lms0_+ASM2
grid     30861     1  0 19:36 ?        00:00:00 asm_lmhb_+ASM2
grid     30866     1  0 19:36 ?        00:00:00 asm_lck1_+ASM2
grid     30868     1  0 19:36 ?        00:00:00 asm_dbw0_+ASM2
grid     30870     1  0 19:36 ?        00:00:00 asm_lgwr_+ASM2
grid     30872     1  0 19:36 ?        00:00:00 asm_ckpt_+ASM2
grid     30874     1  0 19:36 ?        00:00:00 asm_smon_+ASM2
grid     30876     1  0 19:36 ?        00:00:00 asm_lreg_+ASM2
grid     30878     1  0 19:36 ?        00:00:00 asm_pxmn_+ASM2
grid     30880     1  3 19:36 ?        00:00:00 asm_rbal_+ASM2
grid     30882     1  2 19:36 ?        00:00:00 asm_gmon_+ASM2
grid     30885     1  0 19:36 ?        00:00:00 asm_mmon_+ASM2
grid     30887     1  0 19:36 ?        00:00:00 asm_mmnl_+ASM2
grid     30889     1  0 19:36 ?        00:00:00 asm_imr0_+ASM2
grid     30891     1  0 19:36 ?        00:00:00 asm_scm0_+ASM2
grid     30893     1  0 19:36 ?        00:00:00 asm_lck0_+ASM2
grid     30895     1  1 19:36 ?        00:00:00 asm_gcr0_+ASM2
grid     30898     1  0 19:36 ?        00:00:00 asm_m000_+ASM2
grid     30900     1  1 19:36 ?        00:00:00 oracle+ASM2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     30960 17059  0 19:37 pts/0    00:00:00 grep --color=auto -i asm

# 查看节点crs资源状态
[root@node02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       node02                   Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       node02                   STABLE
ora.crf
      1        ONLINE  ONLINE       node02                   STABLE
ora.crsd
      1        ONLINE  ONLINE       node02                   STABLE
ora.cssd
      1        ONLINE  ONLINE       node02                   STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       node02                   STABLE
ora.ctssd
      1        ONLINE  ONLINE       node02                   ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.driver.afd
      1        ONLINE  ONLINE       node02                   STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       node02                   STABLE
ora.evmd
      1        ONLINE  ONLINE       node02                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       node02                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       node02                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       node02                   STABLE
ora.storage
      1        ONLINE  ONLINE       node02                   STABLE
--------------------------------------------------------------------------------



# 在节点2的crs启动成功后,启动节点1的crs
[root@node01 ~]# crsctl stat crs
CRS-4123: Oracle High Availability Services has been started.


[root@node01 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                               STABLE
ora.crf
      1        ONLINE  ONLINE       node01                   STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  OFFLINE      node01                   STARTING
ora.cssdmonitor
      1        ONLINE  ONLINE       node01                   STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.driver.afd
      1        ONLINE  ONLINE       node01                   STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       node01                   STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE node01                   STABLE
ora.gipcd
      1        ONLINE  ONLINE       node01                   STABLE
ora.gpnpd
      1        ONLINE  ONLINE       node01                   STABLE
ora.mdnsd
      1        ONLINE  ONLINE       node01                   STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

重复关闭所有节点crs，然后启动任一节点crs后再启动另外节点的crs,发现后者永远crs打不开，同时观察tfactl日志能够看到CRS-8503

至此验证猜想成立。

接下来查阅oracle官方文档

Bug 27965004 - CRSD Constantly Stopping/Restarting CRS-8503 (Doc ID 27965004.8)

OSYSMOND Crash with CRS 8503 [crfjson_format_system()+1242] (Doc ID 2542493.1)

CRSD Terminate With CRS-8503 [_ZN7cls_pe321ResourceGroupInstance14refreshMembersEv()+1749] (Doc ID 2789356.1)
Applies to --> Oracle Database - Enterprise Edition - Version 19.3.0.0.0 to 19.11.0.0.0 [Release 19]

Bug 31861990 - CRS-8503 - sltsmna- 6 (Doc ID 31861990.8)
这篇文章介绍到受bug<31861990>影响的版本有19.8.0，19.6.0，19.4.0，12.2.0.1 并且在19.11.0.0.210420 OCW RU修复

Bug 25313411 - Orarootagent crashes with error CRS-8503 (Doc ID 25313411.8)

<最后我们采用这个19.11 RU来修复此问题>

oracle

文章转载自DBA管家，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Oracle 19c RAC - CRS 8503 [bug]

评论