RAC on Windows: How to Replace Voting Disks "In Place" on the same ASM Diskgroup

原创 eygle 2020-08-27

2632

Applies to:
Oracle Database - Enterprise Edition - Version 12.1.0.2 and later
Microsoft Windows x64 (64-bit)

Symptoms

Following Windows patch application which included a reboot during which time the Oracle Clusterware was not brought down cleanly ahead of time, CRS will not start.

Specifically ora.ocssd fails to start in both nodes, with the following errors reported:

$TRACE\ocssd.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: $INCIDENT\incdir_1\ocssd_i1.trc
2019-02-28 08:07:21.431 [OCSSD(8072)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in $TRACE\ocssd.trc
2019-02-28 08:07:21.432 [OCSSD(8072)]CRS-1603: CSSD on node <nodename1> shutdown by user.
2019-02-28 08:07:26.441 [OCSSD(8072)]CRS-8503: Oracle Clusterware OCSSD process with operating system process ID 6024 experienced fatal signal or exception code -1073741819


$INCIDENT\incdir_1\ocssd_i1.trc

2019-02-27 22:45:07.649225 :SKGFD:7612: running stat on disk:\\.\ORCLDISKDATA0
2019-02-27 22:45:07.665603 :SKGFD:7612: Warning :  skgfr_vpd84h (scsi error for vpd84h 0x5) sense data:

2019-02-27 22:45:07.665609 :SKGFD:7612:
 SCSI sense  len(0x20)

...

2019-02-27 22:45:07.842828 :CLSF:7612: checksum failed for disk:\\.\ORCLDISKCRSVOTE0:
2019-02-27 22:45:07.842830 :CLSF:7612: Error: obj 2147483648 blk 0 name 'hard_kfbh' flags 0x65 first 1

The voting disk files are visible on the server nodes. Their headers even appear readable (as demonstrated using 'kfed read '). However, due to the fact that OCSSD itself will not come up, it is clear the voting disks are damaged or corrupted in some way.
Changes

Cause

Voting disk corruption. Likely related to having applied OS patches w/o first cleanly shutting down Oracle Clusterware.

Voting disk corruption required that we recreate the voting disks.

Oracle Clusterware must be explicitly stopped (crsctl stop crs) prior to any OS patch application.

Solution

1.  ensure Oracle Clusterware is completely stopped on all nodes (crsctl stop crs -f) including that the OracleOHService is not started

2.  OFFLINE the shared disks that make up the relevant diskgroup (via diskmgmt.msc)

3.  start CRS in exclusive mode: crsctl start crs -excl -nocrs

4.  At that point ora.cssd can start

5.  ONLINE the shared disks that make up the relevant diskgroup (via diskmgmt.msc)

6.  connect to the ASM instance (via SQLPLUS) and mount the CRSVOTE diskgroup):
select name, state from v$asm_diskgroup;
alter diskgroup <relevant diskgroup> mount;

7.  start ora.crsd resource: crsctl start res ora.crsd -init

8.  crsctl replace votedisk +CRSVOTE

9.  crsctl stop crs -f

10.  crsctl start crs

11. crsctl start crs (on all additional nodes)

oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

RAC on Windows: How to Replace Voting Disks "In Place" on the same ASM Diskgroup

评论