rac环境模拟vote盘和data盘磁盘头损坏的修复

原创 Leo 2022-12-28

504

文档课题：rac环境模拟vote盘和data盘磁盘头损坏的修复.
系统：centos 7.9 64位
数据库：oracle 11.2.0.4 64位
环境：rac (两节点)
1、磁盘组信息
1.1、系统信息
[root@hisdb1 ~]# cat /etc/*release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
1.2、磁盘信息
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk where name is not null order by path;

GROUP_NUMBER NAME   PATH    STATE      TOTAL_MB    FREE_MB
------------ --------------- -------------------- -------- ---------- ----------
           2 DATA02   ORCL:DATA02          NORMAL        10239       6662
           1 DATA03 ORCL:DATA03          NORMAL        20479      13765
           3 DATA04   ORCL:DATA04          NORMAL        10239       9843
SQL> select group_number,name,type,total_mb,free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME     TYPE     TOTAL_MB    FREE_MB
------------ --------------- ------ ---------- ----------
           1 DATA            EXTERN      20479      13765
           2 FRA             EXTERN      10239       6662
           3 OCRBK          EXTERN      10239       9843
[root@hisdb1 disks]# pwd
/dev/oracleasm/disks
[root@hisdb1 disks]# ll /dev/oracleasm/disks/*
brw-rw---- 1 grid asmadmin 8, 17 Dec 27 20:27 /dev/oracleasm/disks/DATA01
brw-rw---- 1 grid asmadmin 8, 33 Dec 27 20:27 /dev/oracleasm/disks/DATA02
brw-rw---- 1 grid asmadmin 8, 49 Dec 27 20:27 /dev/oracleasm/disks/DATA03
brw-rw---- 1 grid asmadmin 8, 65 Dec 27 20:27 /dev/oracleasm/disks/DATA04
说明：以上DATA04对应vote盘，DATA03对应data盘.
2、vote盘
模拟vote盘的损坏以及修复.
2.1、拷贝数据
--从/dev/oracleasm/disks/DATA04拷贝1个8k的块到/home/grid/data04.dd
[grid@hisdb1 disks]$ dd if=/dev/oracleasm/disks/DATA04 of=/home/grid/data04.dd bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000340858 s, 24.0 MB/s
[grid@hisdb1 ~]$ ll data04.dd
-rw-r--r-- 1 grid oinstall 8192 Dec 27 21:32 data04.dd
--借助kfed读取/dev/oracleasm/disks/DATA04磁盘头信息.
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA04 text=data04.txt
[grid@hisdb1 ~]$ head data04.txt
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3855329304 ; 0x00c: 0xe5cba818
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
2.2、损坏磁盘
--破坏votedisk磁盘组的磁盘
[grid@hisdb1 ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/DATA04 bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000128522 s, 63.7 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA04 | head
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
2.3、异常重现
--重启集群
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.hisdb1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.heal.db' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb1'
CRS-2677: Stop of 'ora.hisdb1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.cvu' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.oc4j' on 'hisdb2'
CRS-2677: Stop of 'ora.cvu' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.hisdb2.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.heal.db' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb2'
CRS-2677: Stop of 'ora.hisdb2.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb1'
CRS-2677: Stop of 'ora.oc4j' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.ons' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb1'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb1' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.evmd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb2'
CRS-2677: Stop of 'ora.ons' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb2'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb2' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.evmd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2677: Stop of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb1'
CRS-2677: Stop of 'ora.cssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb2'
CRS-2677: Stop of 'ora.cssd' on 'hisdb2' succeeded
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb2'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb2'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb1'
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb2'
CRS-2676: Start of 'ora.diskmon' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb1'
CRS-2676: Start of 'ora.diskmon' on 'hisdb1' succeeded
……
说明：此时会一直hang住，因为损坏的是投票盘，集群无法启动.
2.4、相关告警
--ocssd.log不断报如下错误：
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmvDiskVerify: Successful discovery of 0 disks
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmvFindInitialConfigs: No voting files found
2022-12-27 22:17:15.937: [    CSSD][3821278976](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d80845c0
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d80845c0
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmRegisterClient: proc(4/0x7fa0d80845c0), client(358/0x7fa0d8071230)
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d80845c0) client(0x7fa0d8071230)
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmDiscEndpcl: gipcDestroy 0x5976
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d8099e80
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d8099e80
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmRegisterClient: proc(5/0x7fa0d8099e80), client(357/0x7fa0d8071230)
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d8099e80) client(0x7fa0d8071230)
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmDiscEndpcl: gipcDestroy 0x598c
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d80845c0
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d80845c0
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssgmRegisterClient: proc(4/0x7fa0d80845c0), client(359/0x7fa0d8071230)
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d80845c0) client(0x7fa0d8071230)
-- alerthisdb1.log报错如下
[grid@hisdb1 hisdb1]$ tail -5000f alerthisdb1.log
每隔15s如下错误
[cssd(7816)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/hisdb1/cssd/ocssd.log
2.5、恢复vote磁盘
[grid@hisdb1 ~]$ kfed repair /dev/oracleasm/disks/DATA04
说明：修复成功后，集群恢复正常.
3、data盘
模拟data盘的损坏和修复.
3.1、拷贝数据
[grid@hisdb1 ~]$ dd if=/dev/oracleasm/disks/DATA03 of=/home/grid/data03.dd bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000373797 s, 21.9 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA03 | head
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3875939376 ; 0x00c: 0xe7062430
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
3.2、损坏磁盘
[grid@hisdb1 ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/DATA03 bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000199175 s, 41.1 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA03 | head
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
3.3、异常重现
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.cvu' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.oc4j' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.hisdb2.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.cvu' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.hisdb1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.heal.db' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb2'
CRS-2677: Stop of 'ora.heal.db' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb1'
CRS-2677: Stop of 'ora.hisdb2.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.hisdb1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ons' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb2'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb2' has completed
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb1'
CRS-2677: Stop of 'ora.ons' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb1'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb1' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.crsd' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.evmd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb2'
CRS-2677: Stop of 'ora.cssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.cssd' on 'hisdb2' succeeded
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb2'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb1'
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb2'
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb2'
CRS-2676: Start of 'ora.diskmon' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.diskmon' on 'hisdb2' succeeded
CRS-2676: Start of 'ora.cssd' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hisdb1'
CRS-2676: Start of 'ora.cssd' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2672: Attempting to start 'ora.ctssd' on 'hisdb2'
CRS-2676: Start of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'hisdb2'
CRS-2676: Start of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2672: Attempting to start 'ora.evmd' on 'hisdb1'
CRS-2676: Start of 'ora.evmd' on 'hisdb2' succeeded
CRS-2676: Start of 'ora.evmd' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hisdb1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hisdb2'
CRS-2676: Start of 'ora.asm' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hisdb1'
CRS-2676: Start of 'ora.asm' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hisdb2'
CRS-2676: Start of 'ora.crsd' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.crsd' on 'hisdb2' succeeded
说明：集群能成功开启，但无法打开实例，因为实例的相关数据文件全在data磁盘组.
3.4、相关异常
[grid@hisdb1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 27 22:46:21 2022

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> col name for a20
SQL> col path for a40
SQL> set line 160
SQL> select name,total_mb,usable_file_mb,state from v$asm_diskgroup;

NAME                   TOTAL_MB USABLE_FILE_MB STATE
-------------------- ---------- -------------- -----------
FRA                       10239           6624 MOUNTED
OCRBK                     10239           9843 MOUNTED
SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
说明：可以看到data磁盘无法挂载.
[grid@hisdb1 hisdb1]$ tail -5000f alerthisdb1.log
2022-12-27 22:46:18.033:
[crsd(10020)]CRS-2807:Resource 'ora.DATA.dg' failed to start automatically.
2022-12-27 22:46:18.033:
[crsd(10020)]CRS-2807:Resource 'ora.DATA.dg' failed to start automatically.
2022-12-27 22:46:18.033:
[crsd(10020)]CRS-2807:Resource 'ora.heal.db' failed to start automatically.
2022-12-27 22:46:18.033:
[crsd(10020)]CRS-2807:Resource 'ora.heal.db' failed to start automatically.
说明：集群告警日志如上.
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk;

GROUP_NUMBER NAME   PATH STATE      TOTAL_MB    FREE_MB
------------ -------------------- --------------- -------- ---------- ----------
           0            ORCL:DATA01     NORMAL            0          0
           0            ORCL:DATA03     NORMAL            0          0
         2 DATA02    ORCL:DATA02     NORMAL        10239       6624
         3 DATA04     ORCL:DATA04     NORMAL        10239       9843
[grid@hisdb2 hisdb2]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE OFFLINE      hisdb1
               ONLINE OFFLINE      hisdb2
ora.FRA.dg
               ONLINE ONLINE       hisdb1
               ONLINE ONLINE       hisdb2
ora.LISTENER.lsnr
               ONLINE ONLINE       hisdb1
               ONLINE ONLINE       hisdb2
ora.OCRBK.dg
               ONLINE ONLINE       hisdb1
               ONLINE ONLINE       hisdb2
ora.asm
               ONLINE ONLINE       hisdb1                   Started
               ONLINE ONLINE       hisdb2                   Started
ora.gsd
               OFFLINE OFFLINE      hisdb1
               OFFLINE OFFLINE      hisdb2
ora.net1.network
               ONLINE ONLINE       hisdb1
               ONLINE ONLINE       hisdb2
ora.ons
               ONLINE ONLINE       hisdb1
               ONLINE ONLINE       hisdb2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE ONLINE       hisdb2
ora.cvu
      1        ONLINE ONLINE       hisdb1
ora.heal.db
      1        ONLINE OFFLINE                               Instance Shutdown
      2        ONLINE OFFLINE                               Instance Shutdown
ora.hisdb1.vip
      1        ONLINE ONLINE       hisdb1
ora.hisdb2.vip
      1        ONLINE ONLINE       hisdb2
ora.oc4j
      1        ONLINE ONLINE       hisdb1
ora.orcl.db
      1        OFFLINE OFFLINE                               Instance Shutdown
      2        OFFLINE OFFLINE                               Instance Shutdown
ora.scan1.vip
      1        ONLINE ONLINE       hisdb2
说明：集群状态显示异常，heal数据库无法开启.
3.5、恢复data磁盘
[grid@hisdb1 ~]$ kfed repair /dev/oracleasm/disks/DATA03
[grid@hisdb1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 27 22:54:47 2022

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data mount;

Diskgroup altered.
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk;

GROUP_NUMBER NAME   PATH           STATE      TOTAL_MB    FREE_MB
------------ --------------- ------------------------- -------- ---------- ----------
           0          ORCL:DATA01        NORMAL            0          0
           2 DATA02 ORCL:DATA02        NORMAL        10239       6618
           1 DATA03 ORCL:DATA03       NORMAL        20479      13765
           3 DATA04 ORCL:DATA04        NORMAL        10239       9843
说明：data磁盘修复成功后，集群恢复正常.

参考文档：
https://www.modb.pro/db/22060
https://blog.csdn.net/jycjyc/article/details/106275991

https://blog.51cto.com/lhrbest/2699983

vote盘损坏 data盘损坏磁盘头修复 kfed repair

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

rac环境模拟vote盘和data盘磁盘头损坏的修复

1、磁盘组信息

1.1、系统信息

1.2、磁盘信息

2、vote盘

2.1、拷贝数据

2.2、损坏磁盘

2.3、异常重现

2.4、相关告警

2.5、恢复vote磁盘

3、data盘

3.1、拷贝数据

3.2、损坏磁盘

3.3、异常重现

3.4、相关异常

3.5、恢复data磁盘

评论