暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 19c RAC - CRS 8503 [bug]

DBA管家 2022-02-08
3688


Oracle官方提供免费的Oracle19c, db以及grid的下载,但是基版是19.3,而对于19.3的有许多bug,其中RAC中会经常遇到CRS8503的bug,目前此bug在RU11中进行了修复,此bug的现象为: 双节点的RAC中只能允许其中一个节点正常启动,另外一个节点会停止且从crs日志中可以看到CRS8503的错误,因此各位小伙伴在搭建19cRAC时确保安装最新的RU避免类似问题.




偶遇bug CRS-8503

表像

最近在搭建一套19c RAC,所有介质都是用的官方提供19.3的版本, 按照<Oracle19c搭建准备工作>搭建完GRID之后,诡异的事是,节点2会自动关闭并且无法再重新启动crs


>
此时两节点crs资源组状态
[root@node01 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE node01 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE node01 STABLE
ora.crf
1 ONLINE ONLINE node01 STABLE
ora.crsd
1 ONLINE ONLINE node01 STABLE
ora.cssd
1 ONLINE ONLINE node01 STABLE
ora.cssdmonitor
1 ONLINE ONLINE node01 STABLE
ora.ctssd
1 ONLINE ONLINE node01 OBSERVER,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.driver.afd
1 ONLINE ONLINE node01 STABLE
ora.drivers.acfs
1 ONLINE ONLINE node01 STABLE
ora.evmd
1 ONLINE ONLINE node01 STABLE
ora.gipcd
1 ONLINE ONLINE node01 STABLE
ora.gpnpd
1 ONLINE ONLINE node01 STABLE
ora.mdnsd
1 ONLINE ONLINE node01 STABLE
ora.storage
1 ONLINE ONLINE node01 STABLE
--------------------------------------------------------------------------------


[root@node02 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE STABLE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE STABLE
ora.crf
1 ONLINE ONLINE node02 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE OFFLINE STABLE
ora.cssdmonitor
1 ONLINE ONLINE node02 STABLE
ora.ctssd
1 ONLINE OFFLINE STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.driver.afd
1 ONLINE ONLINE node02 STABLE
ora.drivers.acfs
1 ONLINE ONLINE node02 STABLE
ora.evmd
1 ONLINE INTERMEDIATE node02 STABLE
ora.gipcd
1 ONLINE ONLINE node02 STABLE
ora.gpnpd
1 ONLINE ONLINE node02 STABLE
ora.mdnsd
1 ONLINE ONLINE node02 STABLE
ora.storage
1 ONLINE OFFLINE STABLE
--------------------------------------------------------------------------------

[root@node01 ~]# crsctl check cluster -all
**************************************************************
node01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@node02 ~]# crsctl check cluster -all
**************************************************************
node02:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************



然后通过TFA查看日志

12c之后,tfa会默认安装


[root@node01 ~]# tfactl analyze -since 1d
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... Please wait...
INFO: analyzing host: node01

Report title: Analysis of Alert,System Logs
Report date range: last ~1 day(s)
Report (default) time zone: CST - China Standard Time
Analysis started at: 23-Jan-2022 07:21:16 PM CST
Elapsed analysis time: 7 second(s).
Configuration file: u01/app/19.3.0/grid/tfa/node01/tfa_home/ext/tnt/conf/tnt.prop
Configuration group: all
Total message count: 34,155, from 23-Jan-2021 07:21:19 PM CST to 23-Jan-2022 07:21:14 PM CST
Messages matching last ~1 day(s): 33,756, from 22-Jan-2022 07:36:53 PM CST to 23-Jan-2022 07:21:14 PM CST
last ~1 day(s) error count: 5, from 22-Jan-2022 07:46:30 PM CST to 23-Jan-2022 06:29:17 PM CST
last ~1 day(s) ignored error count: 0
last ~1 day(s) unique error count: 5

Message types for last ~1 day(s)
Occurrences percent server name type
----------- ------- -------------------- -----
33,733 99.9% node01 generic
18 0.1% node01 WARNING
5 0.0% node01 ERROR
----------- -------
33,756 100.0%

Unique error messages for last ~1 day(s)
Occurrences percent server name error
----------- ------- -------------------- -----
1 20.0% node01 [OCSSD(3342)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 node02 .
1 20.0% node01 [OCSSD(28200)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc
1 20.0% node01 [OCSSD(20206)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 .
1 20.0% node01 [OCSSD(887)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc
1 20.0% node01 [OCSSD(24769)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 .
----------- -------
5 100.0%


INFO: analyzing all (Alert and Unix System Logs) logs for the last 1440 minutes... Please wait...
INFO: analyzing host: node02

Report title: Analysis of Alert,System Logs
Report date range: last ~1 day(s)
Report (default) time zone: CST - China Standard Time
Analysis started at: 23-Jan-2022 07:21:26 PM CST
Elapsed analysis time: 3 second(s).
Configuration file: /u01/app/19.3.0/grid/tfa/node02/tfa_home/ext/tnt/conf/tnt.prop
Configuration group: all
Total message count: 21,591, from 10-Jan-2022 08:40:45 PM CST to 23-Jan-2022 07:20:01 PM CST
Messages matching last ~1 day(s): 4,713, from 22-Jan-2022 07:36:52 PM CST to 23-Jan-2022 07:20:01 PM CST
last ~1 day(s) error count: 5, from 22-Jan-2022 07:53:09 PM CST to 23-Jan-2022 06:41:51 PM CST
last ~1 day(s) ignored error count: 0
last ~1 day(s) unique error count: 4

Message types for last ~1 day(s)
Occurrences percent server name type
----------- ------- -------------------- -----
4,659 98.9% node02 generic
49 1.0% node02 WARNING
5 0.1% node02 ERROR
----------- -------
4,713 100.0%

Unique error messages for last ~1 day(s)
Occurrences percent server name error
----------- ------- -------------------- -----
2 40.0% node02 [OCSSD(3282)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node02 .
1 20.0% node02 [OCSSD(3282)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node01 node02 .
1 20.0% node02 [OCSSD(2005)]CRS-1601: CSSD Reconfiguration complete. Active nodes are node02 .
1 20.0% node02 [OCSSD(20807)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
----------- -------
5 100.0%

>
根据tfa结果分别查看对应的ocss日志
[root@node01 bin]# tail -50f /u01/app/grid/diag/crs/node01/crs/trace/ocssd.trc

2022-01-23 19:24:23.842 : CSSD:3113711360: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:23.843 : CSSD:3126068992: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:23.843 : CSSD:3110557440: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:24.342 : CSSD:2670159616: [ INFO] clssnmSendingThread: sending status msg to all nodes
2022-01-23 19:24:24.342 : CSSD:2670159616: [ INFO] clssnmSendingThread: sent 5 status msgs to all nodes
2022-01-23 19:24:24.918 : CSSD:2667005696: [ INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 19:24:25.940 : CSSD:3113711360: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:25.940 : CSSD:3113711360: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:25.940 : CSSD:3126068992: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:25.941 : CSSD:3110557440: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:27.991 : CSSD:3113711360: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:27.991 : CSSD:3113711360: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:27.991 : CSSD:3126068992: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:27.991 : CSSD:3110557440: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:29.346 : CSSD:2670159616: [ INFO] clssnmSendingThread: sending status msg to all nodes
2022-01-23 19:24:29.346 : CSSD:2670159616: [ INFO] clssnmSendingThread: sent 5 status msgs to all nodes
2022-01-23 19:24:29.919 : CSSD:2667005696: [ INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 19:24:30.044 : CSSD:3113711360: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:30.044 : CSSD:3113711360: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:30.044 : CSSD:3126068992: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:30.045 : CSSD:3110557440: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
2022-01-23 19:24:32.148 : CSSD:3113711360: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:32.148 : CSSD:3113711360: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 16:2:1
2022-01-23 19:24:32.148 : CSSD:3126068992: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 16:2:1, datatype 1 datasize 4
2022-01-23 19:24:32.148 : CSSD:3110557440: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 16:2:1 from clientID 1:41:2
^C
<节点1ocss日志一直在按照上面的输出重复刷新着。。。>

#
节点2的ocss日志

#
tail -50f /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
> 节点2的日志终止于最后的错误。
2022-01-23 18:41:54.121 : CSSD:4244395776: [ INFO] clssscthrdmain: Terminating thread GM Peer Lsnr
2022-01-23 18:41:54.148 : CSSD:4242818816: [ INFO] clssgmpSendMsgToGMC: Cannot send it to a hub node
2022-01-23 18:41:54.148 : CSSD:4242818816: [ INFO] clssgmpcGMPShutdownComplete: GMP has finished its shutdown
2022-01-23 18:41:54.149 : CSSD:3716949760: [ INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)
2022-01-23 18:41:54.255 : CSSD:3734296320: [ INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47697, LATS 21991244, lastSeqNo 47694, uniqueness 1642933741, timestamp 1642934511/21965144
2022-01-23 18:41:54.728 : CSSD:3720103680: [ WARNING] clssnmSendingThread: state(1) clusterState(0) exit
2022-01-23 18:41:54.728 : CSSD:3720103680: [ INFO] clssscthrdmain: Terminating thread clssnmSendingThread
2022-01-23 18:41:55.258 : CSSD:3734296320: [ INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47700, LATS 21992254, lastSeqNo 47697, uniqueness 1642933741, timestamp 1642934512/21966144
2022-01-23 18:41:55.263 : CSSD:3729565440: [ INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47701, LATS 21992254, lastSeqNo 47653, uniqueness 1642933741, timestamp 1642934512/21966154
2022-01-23 18:41:55.556 :GIPCXCPT:4252276480: gipcWaitF [EvmConWait : evmgipcio.c : 303]: EXCEPTION[ ret (uknown) (910) ] failed to wait on obj 0x7fadd82ba580 [0000000000005d7c] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=7ce66a8b-4c5a7358-20807))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=4c5a7358-7ce66a8b-20598))', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 20598, readyRef (nil), ready 0, wobj 0x7fadd82ba530, sendp 0x7fadd82bce40 status 0flags 0xa100a716, flags-2 0x100, usrFlags 0x30020 }, reqList 0x7fadfd73a138, nreq 1, creq 0x7fadfd73a130 timeout 30000 ms, flags 0x240
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsGNSEvtHandler: clsce evt res CONN (3)
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsCheckGNSConfigured: CLSCE wait(30000) returned 0, clskerror: clsce: CRS-10203: (:CLSCE0063:) Could not connect to the Event Manager daemon, evtres 3
2022-01-23 18:41:55.557 : CSSDGNS:4252276480: clssgnsCheckGNSConfigured: CLSCE connection error, re-subscribing for GNS resource events.
2022-01-23 18:41:55.558 : CLSCEVT:4252276480: (:CLSCE0028:)clsce_unsubscribe 0x7fadd8139870 successfully unsubscribed : 0
2022-01-23 18:41:56.261 : CSSD:3734296320: [ INFO] clssnmvDHBValidateNCopy: node 1, node01, has a disk HB, but no network HB, DHB has rcfg 538428557, wrtcnt, 47703, LATS 21993254, lastSeqNo 47700, uniqueness 1642933741, timestamp 1642934513/21967164
Trace file /u01/app/grid/diag/crs/node02/crs/trace/ocssd.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.3.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved.
DDE: Flood control is not active
CLSB:4268476160: [ INFO] Oracle Clusterware infrastructure error in OCSSD (OS PID 20807): Fatal signal 6 has occurred in program ocssd thread 4268476160; nested signal count is 1
2022-01-23T18:41:56.960547+08:00
Incident 1 created, dump file: /u01/app/grid/diag/crs/node02/crs/incident/incdir_1/ocssd_i1.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []

--->

#node02#

tail -50f /u01/app/grid/diag/crs/node02/crs/incident/incdir_1/ocssd_i1.trc
查看trace
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: Checking configured voting files
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: number of voting files 3
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: Checkin for AFD:CRS1 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS1)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: Checkin for AFD:CRS2 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS2)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: Checkin for AFD:CRS3 cur_ms 21994124 lastDPMTCheck 21990564 lastWrite 21990564 killcheck 21994124+
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskCheck: disk still good (3560/AFD:CRS3)
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmCountVfInSite: vf number = 0
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmCountVfInSite: vf number = 1
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmCountVfInSite: vf number = 2
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmCountVfInSite: In site = 00112233-44556677-8899aabb-ccddeeff, vf count = 3,
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmSiteMajority: number of sites currently available 1 has fallen to the minimum no of sites required 1
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmvDiskPMT: sleeping for 1000 ms
4268476160: [CLSDIMT] 2022-01-23 18:41:57.135 :CSSD:3723257600: [ INFO] clssnmWaitThread: thrd(9), timeout(1000), wakeonpost(0)
2022-01-23 18:41:57.226 :CLSDIMT:4268476160: Wraps: [13] Size: [10005,129]
2022-01-23 18:41:57.226 :CLSDIMT:4268476160: ===> CLSD In-memory buffer ends
----- END DDE Action: 'clsdAdrActions' (SUCCESS, 24 csec) -----
[TOC00018-END]
----- END DDE Actions Dump (total 25 csec) -----
[TOC00004-END]
End of Incident Dump
[TOC00002-END]
TOC00000 - Table of contents
TOC00001 - Error Stack
TOC00002 - Dump for incident 1 (CRS 8503) <<<< -- 又是CRS 8503
| TOC00003 - START Event Driven Actions Dump
| TOC00004 - START DDE Actions Dump
| | TOC00005 - START DDE Action: 'dumpFrameContext' (Sync)
| | | TOC00006 - START Frame Context DUMP
| | TOC00007 - START DDE Action: 'dumpDiagCtx' (Sync)
| | | TOC00008 - Diag Context Dump
| | TOC00009 - START DDE Action: 'dumpBuckets' (Sync)
| | | TOC00010 - Trace Bucket Dump Begin: CLSD_SHARED_BUCKET
| | TOC00011 - START DDE Action: 'dumpGeneralConfiguration' (Sync)
| | | TOC00012 - General Configuration
| | TOC00013 - START DDE Action: 'xdb_dump_buckets' (Sync)
| | TOC00014 - START DDE Action: 'dumpKGERing' (Sync)
| | TOC00015 - START DDE Action: 'dumpKGEIEParms' (Sync)
| | TOC00016 - START DDE Action: 'dumpKGEState' (Sync)
| | TOC00017 - START DDE Action: 'kpuActionDefault' (Sync)
| | TOC00018 - START DDE Action: 'clsdAdrActions' (Sync)
End of TOC



随机在两台机器上check对应磁盘,并未发现任何异常


SQL> SQL> SQL> SQL> SQL> SQL>
GROUP_NUMBER STATE REDUNDANCY TOTAL_MB FREE_MB NAME FAILGROUP HEADER_STATU
------------ ---------- --------------------- ---------- ---------- --------------- -------------------- ------------
0 NORMAL UNKNOWN 0 0 MEMBER
0 NORMAL UNKNOWN 0 0 MEMBER
0 NORMAL UNKNOWN 0 0 MEMBER
0 NORMAL UNKNOWN 0 0 MEMBER
3 NORMAL UNKNOWN 15360 15308 FRA_0000 FRA_0000 MEMBER
1 NORMAL UNKNOWN 2048 1748 CRS2 CRS2 MEMBER
1 NORMAL UNKNOWN 2048 1740 CRS1 CRS1 MEMBER
1 NORMAL UNKNOWN 2048 1740 CRS3 CRS3 MEMBER
2 NORMAL UNKNOWN 10240 10152 DATA1 DATA1 MEMBER

9 rows selected.

SQL>
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'YYYY-MM-DDHH24:MI:SS')
---------------------------------------------------------
2022-01-23 19:32:17


SQL>
select instance_name from v$instance;

INSTANCE_NAME
------------------------------------------------
+ASM1




初步还猜测,既如此,两节点只能存活一个,为此测试

# 关闭节点crs
[root@node02 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node02'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node02'
CRS-2673: Attempting to stop 'ora.evmd' on 'node02'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node02'
CRS-2673: Attempting to stop 'ora.crf' on 'node02'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'node02'
CRS-2673: Attempting to stop 'ora.driver.afd' on 'node02'
CRS-2677: Stop of 'ora.evmd' on 'node02' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node02' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'node02' succeeded
CRS-2677: Stop of 'ora.driver.afd' on 'node02' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node02'
CRS-2677: Stop of 'ora.mdnsd' on 'node02' succeeded
CRS-2677: Stop of 'ora.crf' on 'node02' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'node02'
CRS-2677: Stop of 'ora.gpnpd' on 'node02' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'node02' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node02' has completed
CRS-4133: Oracle High Availability Services has been stopped.

#
关闭节点1

[root@node01 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node01'
CRS-2673: Attempting to stop 'ora.crsd' on 'node01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'node01'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'node01'
CRS-2673: Attempting to stop 'ora.cvu' on 'node01'
CRS-2673: Attempting to stop 'ora.node02.vip' on 'node01'
CRS-2673: Attempting to stop 'ora.chad' on 'node01'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'node01'
CRS-2673: Attempting to stop 'ora.qosmserver' on 'node01'
CRS-33673: Attempting to stop resource group 'ora.asmgroup' on server 'node01'
CRS-2673: Attempting to stop 'ora.CRS.dg' on 'node01'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'node01'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'node01'
CRS-2677: Stop of 'ora.DATA.dg' on 'node01' succeeded
CRS-2677: Stop of 'ora.CRS.dg' on 'node01' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'node01'
CRS-2677: Stop of 'ora.node02.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'node01'
CRS-2677: Stop of 'ora.cvu' on 'node01' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.node01.vip' on 'node01'
CRS-2677: Stop of 'ora.asm' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'node01'
CRS-2677: Stop of 'ora.scan1.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.node01.vip' on 'node01' succeeded
CRS-2677: Stop of 'ora.chad' on 'node01' succeeded
CRS-2677: Stop of 'ora.qosmserver' on 'node01' succeeded
CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asmnet1.asmnetwork' on 'node01'
CRS-2677: Stop of 'ora.asmnet1.asmnetwork' on 'node01' succeeded
CRS-33677: Stop of resource group 'ora.asmgroup' on server 'node01' succeeded.
CRS-2673: Attempting to stop 'ora.ons' on 'node01'
CRS-2677: Stop of 'ora.ons' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'node01'
CRS-2677: Stop of 'ora.net1.network' on 'node01' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'node01' has completed
CRS-2677: Stop of 'ora.crsd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'node01'
CRS-2673: Attempting to stop 'ora.crf' on 'node01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node01'
CRS-2677: Stop of 'ora.storage' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'node01'
CRS-2677: Stop of 'ora.mdnsd' on 'node01' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'node01' succeeded
CRS-2677: Stop of 'ora.crf' on 'node01' succeeded
CRS-2677: Stop of 'ora.asm' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'node01'
CRS-2673: Attempting to stop 'ora.evmd' on 'node01'
CRS-2677: Stop of 'ora.evmd' on 'node01' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node01'
CRS-2677: Stop of 'ora.cssd' on 'node01' succeeded
CRS-2673: Attempting to stop 'ora.driver.afd' on 'node01'
CRS-2673: Attempting to stop 'ora.gipcd' on 'node01'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node01'
CRS-2677: Stop of 'ora.driver.afd' on 'node01' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node01' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'node01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node01' has completed
CRS-4133: Oracle High Availability Services has been stopped.

#
先重试启动节点2的crs

[root@node02 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@node02 ~]# ps -ef | grep -i asm
grid 30682 1 17 19:36 ? 00:00:00 oracle+ASM2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root 30750 17059 0 19:36 pts/0 00:00:00 grep --color=auto -i asm
[root@node02 ~]# ps -ef | grep -i asm
grid 30827 1 0 19:36 ? 00:00:00 asm_pmon_+ASM2
grid 30829 1 0 19:36 ? 00:00:00 asm_clmn_+ASM2
grid 30831 1 0 19:36 ? 00:00:00 asm_psp0_+ASM2
grid 30834 1 3 19:36 ? 00:00:00 asm_vktm_+ASM2
grid 30838 1 0 19:36 ? 00:00:00 asm_gen0_+ASM2
grid 30840 1 0 19:36 ? 00:00:00 asm_mman_+ASM2
grid 30844 1 0 19:36 ? 00:00:00 asm_gen1_+ASM2
grid 30847 1 0 19:36 ? 00:00:00 asm_diag_+ASM2
grid 30849 1 0 19:36 ? 00:00:00 asm_ping_+ASM2
grid 30851 1 0 19:36 ? 00:00:00 asm_pman_+ASM2
grid 30853 1 0 19:36 ? 00:00:00 asm_dia0_+ASM2
grid 30855 1 1 19:36 ? 00:00:00 asm_lmon_+ASM2
grid 30857 1 0 19:36 ? 00:00:00 asm_lmd0_+ASM2
grid 30859 1 1 19:36 ? 00:00:00 asm_lms0_+ASM2
grid 30861 1 0 19:36 ? 00:00:00 asm_lmhb_+ASM2
grid 30866 1 0 19:36 ? 00:00:00 asm_lck1_+ASM2
grid 30868 1 0 19:36 ? 00:00:00 asm_dbw0_+ASM2
grid 30870 1 0 19:36 ? 00:00:00 asm_lgwr_+ASM2
grid 30872 1 0 19:36 ? 00:00:00 asm_ckpt_+ASM2
grid 30874 1 0 19:36 ? 00:00:00 asm_smon_+ASM2
grid 30876 1 0 19:36 ? 00:00:00 asm_lreg_+ASM2
grid 30878 1 0 19:36 ? 00:00:00 asm_pxmn_+ASM2
grid 30880 1 3 19:36 ? 00:00:00 asm_rbal_+ASM2
grid 30882 1 2 19:36 ? 00:00:00 asm_gmon_+ASM2
grid 30885 1 0 19:36 ? 00:00:00 asm_mmon_+ASM2
grid 30887 1 0 19:36 ? 00:00:00 asm_mmnl_+ASM2
grid 30889 1 0 19:36 ? 00:00:00 asm_imr0_+ASM2
grid 30891 1 0 19:36 ? 00:00:00 asm_scm0_+ASM2
grid 30893 1 0 19:36 ? 00:00:00 asm_lck0_+ASM2
grid 30895 1 1 19:36 ? 00:00:00 asm_gcr0_+ASM2
grid 30898 1 0 19:36 ? 00:00:00 asm_m000_+ASM2
grid 30900 1 1 19:36 ? 00:00:00 oracle+ASM2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root 30960 17059 0 19:37 pts/0 00:00:00 grep --color=auto -i asm

# 查看节点crs资源状态
[root@node02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE node02 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE node02 STABLE
ora.crf
1 ONLINE ONLINE node02 STABLE
ora.crsd
1 ONLINE ONLINE node02 STABLE
ora.cssd
1 ONLINE ONLINE node02 STABLE
ora.cssdmonitor
1 ONLINE ONLINE node02 STABLE
ora.ctssd
1 ONLINE ONLINE node02 ACTIVE:0,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.driver.afd
1 ONLINE ONLINE node02 STABLE
ora.drivers.acfs
1 ONLINE ONLINE node02 STABLE
ora.evmd
1 ONLINE ONLINE node02 STABLE
ora.gipcd
1 ONLINE ONLINE node02 STABLE
ora.gpnpd
1 ONLINE ONLINE node02 STABLE
ora.mdnsd
1 ONLINE ONLINE node02 STABLE
ora.storage
1 ONLINE ONLINE node02 STABLE
--------------------------------------------------------------------------------



#
在节点2的crs启动成功后,启动节点1的crs
[root@node01 ~]# crsctl stat crs
CRS-4123: Oracle High Availability Services has been started.


[root@node01 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE STABLE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE STABLE
ora.crf
1 ONLINE ONLINE node01 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE OFFLINE node01 STARTING
ora.cssdmonitor
1 ONLINE ONLINE node01 STABLE
ora.ctssd
1 ONLINE OFFLINE STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.driver.afd
1 ONLINE ONLINE node01 STABLE
ora.drivers.acfs
1 ONLINE ONLINE node01 STABLE
ora.evmd
1 ONLINE INTERMEDIATE node01 STABLE
ora.gipcd
1 ONLINE ONLINE node01 STABLE
ora.gpnpd
1 ONLINE ONLINE node01 STABLE
ora.mdnsd
1 ONLINE ONLINE node01 STABLE
ora.storage
1 ONLINE OFFLINE STABLE
--------------------------------------------------------------------------------



重复关闭所有节点crs,然后启动任一节点crs后再启动另外节点的crs,发现后者永远crs打不开,同时观察tfactl日志能够看到CRS-8503

至此验证猜想成立。


接下来查阅oracle官方文档

Bug 27965004 - CRSD Constantly Stopping/Restarting CRS-8503 (Doc ID 27965004.8)

OSYSMOND Crash with CRS 8503 [crfjson_format_system()+1242] (Doc ID 2542493.1)

CRSD Terminate With CRS-8503 [_ZN7cls_pe321ResourceGroupInstance14refreshMembersEv()+1749] (Doc ID 2789356.1)
Applies to --> Oracle Database - Enterprise Edition - Version 19.3.0.0.0 to 19.11.0.0.0 [Release 19]

Bug 31861990 - CRS-8503 - sltsmna- 6 (Doc ID 31861990.8)
这篇文章介绍到受bug<31861990>影响的版本有19.8.0,19.6.0,19.4.0,12.2.0.1 并且在19.11.0.0.210420 OCW RU修复

Bug 25313411 - Orarootagent crashes with error CRS-8503 (Doc ID 25313411.8)

<最后我们采用这个19.11 RU来修复此问题>


文章转载自DBA管家,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论