暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

丢失OCR盘后的恢复

IT那活儿 2024-07-02
134

点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!  




故障现象



1.1 使用orccheck命令报错

[root@rac1 ~]# ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage

1.2 使用ocrconfig -restore恢复仍然报错

[root@rac1 rac-cluster]#ocrconfig -restore u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-35: The configured OCR locations are not accessible.




处理步骤



2.1 强制停掉两个节点的crs(两个节点都要执行)

[root@rac1 rac-cluster]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac2 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac2'
CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

2.2 在节点一上已独占方式启动(只启动ASM实例 但不启动CRS)
[root@rac1 ~]#crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded

2.3 重建原votedisk所在的磁盘组
[root@rac1 ~]#su - grid
[grid@rac1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Feb 28 00:09:32 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create diskgroup systemdg normal redundancy disk '/dev/asm-diskb','/dev/asm-diskc','/dev/asm-diskd' ATTRIBUTE 'compatible.rdbms' = '11.2', 'compatible.asm' = '11.2';
Diskgroup created.

2.4 使用最新的OCR备份进行恢复
[root@rac1 ~]#ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]#ocrcheck
Status of Oracle Cluster Registry is as follows :
     Version : 3
     Total space (kbytes) : 262120
     Used space (kbytes) : 2900
     Available space (kbytes) : 259220
     ID : 1207909490
     Device/File Name : +SYSTEMDG
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
     Cluster registry integrity check succeeded
     Logical corruption check succeeded

2.5 恢复votedisk
[root@rac1 ~]#crsctl query css votedisk
Located 0 voting disk(s).
[grid@rac1 ~]$ crsctl replace votedisk +SYSTEMDG
Successful addition of voting disk a0e3b9c0c5934f79bf67aa783618b59d.
Successful addition of voting disk 422a673bdbc94f97bff51f66f84a105e.
Successful addition of voting disk a1b110709eb84f39bf351f1b01aa0c7d.
Successfully replaced voting disk group with +SYSTEMDG.
CRS-4266: Voting file(s) successfully replaced
[root@rac1 rac-cluster]#crsctl query css votedisk
##STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
 1. ONLINE 03bd9e6851cf4fb4bf4c92e24a3d71cc (/dev/asm-diskb) [SYSTEMDG]
 2. ONLINE 362f2cfb5b3d4f29bf319c668e0efbe4 (/dev/asm-diskc) [SYSTEMDG]
 3. ONLINE 57659c53f4284fbdbfeabafb20c3fbdd (/dev/asm-diskd) [SYSTEMDG]

Notes:如果在恢复时遇到如下错误:
[grid@rac1 ~]$ crsctl replace votedisk +SYSTEMDG
CRS-4602: Failed 27 to add voting file a0e3b9c0c5934f79bf67aa783618b59d.
CRS-4602: Failed 27 to add voting file 422a673bdbc94f97bff51f66f84a105e.
CRS-4602: Failed 27 to add voting file a1b110709eb84f39bf351f1b01aa0c7d.
Failed to replace voting disk group with +SYSTEMDG.
CRS-4000: Command Replace failed, or completed with errors.
--登陆grid用户,检查并修改asm_diskstring参数
SQL> alter system set asm_diskstring='/dev/asm*';
System altered.
--检查并修改spfile参数,并重启ASM
SQL> create spfile from memory;
File created.

2.6 重启crs
[root@rac1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.asm' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac1 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.

2.7 由于启动较慢,过几分钟查看状态
[root@rac1 ~]# crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.DATADG.dg  ora....up.type ONLINE    ONLINE    rac1       
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1       
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac1       
ora....EMDG.dg ora....up.type ONLINE    ONLINE    rac1       
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1       
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac1       
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              
ora....network ora....rk.type ONLINE    ONLINE    rac1       
ora.oc4j       ora.oc4j.type  ONLINE    OFFLINE              
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1       
ora.orcl.db    ora....se.type ONLINE    OFFLINE              
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    OFFLINE   OFFLINE              
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1       
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac1       
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac1       
     
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

但是在启动db的时候报错了.........
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following error:
ORA-00205: error in identifying control file, check alert log for more info
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac1/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-5017: The resource action "ora.orcl.db start" encountered the following error:
ORA-00205: error in identifying control file, check alert log for more info
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac2/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.orcl.db' on 'rac1' failed
CRS-2674: Start of 'ora.orcl.db' on 'rac2' failed
CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on that would satisfy its placement policy

--查看了下trace发现SYSTEMDG中的控制文件没了,装RAC的时候就没规划好,导致控制文件放到了votedisk里,但是SYSTEMDG刚刚被重建了,所以丢了。
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.8.222)(PORT=1521))' SCOPE=MEMORY SID='orcl1';
ALTER DATABASE MOUNT /* db agent *//* {1:40511:558} */
This instance was first to mount
NOTE: Loaded library: System
SUCCESS: diskgroup DATADG was mounted
SUCCESS: diskgroup SYSTEMDG was mounted
NOTE: dependency between database orcl and diskgroup resource ora.DATADG.dg is established
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+SYSTEMDG/orcl/controlfile/current.261.904952269'
ORA-17503: ksfdopn:2 Failed to open file +SYSTEMDG/orcl/controlfile/current.261.904952269
ORA-15012: ASM file '+SYSTEMDG/orcl/controlfile/current.261.904952269' does not exist
ORA-205 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:40511:558} */...
Sat Feb 27 23:48:14 2016
Shutting down instance (abort)
License high water mark = 1
USER (ospid: 21150): terminating the instance
Instance terminated by USER, pid = 21150
Sat Feb 27 23:48:14 2016
Instance shutdown complete

2.8 将数据库启动到nomount阶段
[grid@rac1 trace]$ srvctl start database -d orcl -o nomount
2.9 在节点一上使用rman恢复一份控制文件到SYSTEMDG上
[oracle@rac1 ~]$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Sat Feb 27 23:53:35 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL (not mounted)
RMAN> restore controlfile to '+SYSTEMDG' from '+DATADG/orcl/controlfile/current.260.901699963';
Starting restore at 2016/02/27 23:53:43
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=34 instance=orcl1 device type=DISK
channel ORA_DISK_1: copied control file copy
Finished restore at 2016/02/27 23:53:52

2.10 已恢复好的控制文件路径和名称
ASMCMD> pwd
+systemdg/orcl/CONTROLFILE
ASMCMD> ls
current.261.904953225

2.11 节点一上登陆db,修改control_files参数
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Feb 27 23:55:09 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> alter system set control_files='+DATADG/orcl/controlfile/current.260.901699963','+SYSTEMDG/orcl/CONTROLFILE/current.261.904953225' scope=spfile sid='*';
System altered.

2.12 重启数据库,并检查数据库状态
[grid@rac1 trace]$ srvctl stop database -d orcl
[grid@rac1 trace]$ srvctl start database -d orcl
[grid@rac1 trace]$ srvctl status database -d orcl
Instance orcl1 is running on node rac1
Instance orcl2 is running on node rac2

保险起见,查看两个节点上的控制文件参数是否正确:
[oracle@rac1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Feb 27 23:58:47 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> show parameter control_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------------------------------------------------------------------------
control_file_record_keep_time integer 7
control_files string        +DATADG/orcl/controlfile/current.260.901699963, +SYSTEMDG/orcl/controlfile/current.261.904953225
                                              
[oracle@rac2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Feb 27 23:59:29 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> show parameter control_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------------------------------------------------------------------------
control_file_record_keep_time integer 7
control_files string        +DATADG/orcl/controlfile/current.260.901699963, +SYSTEMDG/orcl/controlfile/current.261.904953225

OK,收工~~~~

END


本文作者:鲁伟锋(上海新炬中北团队)

本文来源:“IT那活儿”公众号

文章转载自IT那活儿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论