物理Data Guard（四）Switchover

原创大柏树 2022-12-29

630

一.概述

前面说到，如果条件允许，最好能定期进行主、备角色切换，这样不仅能检测备库的有效性，而且能熟悉数据库的切换流程。当发生灾难时，一个完善的切换流程能极大地缩短数据库的切换时间。那我们来讨论一下switchover的时候应该注意哪些事情。

1.提前检查DG的服务器配置，能否承担业务。
2.测试dblink在备库是否可用。
3.客户端连接信息修改。提前做域名解析，这样客户端不用一一修改。
4.切换失败的回退。
具体在SQLPLUS切换的细节可以参考文档：
Data Guard Physical Standby Switchover Best Practices using SQLPlus (Doc ID 1304939.1)

需要注意的是，如果是RAC，执行切换的时候请关闭一个节点，切换之后再打开。

二.环境

主库：11.2.0.4 RAC - linux平台
备库：11.2.0.4 单实例 - linux平台

三.切换

1.切换前的检查

1.1.首先检查参数：

以下是示例：

primary：
DB_NAME：orcl
DB_UNIQUE_NAME：orcl
Oracle Net Service Name：orcl
LOG_ARCHIVE_CONFIG='DG_CONFIG=(orcl,orclbak)'
CONTROL_FILES='/arch1/orcl/control1.ctl', '/arch2/orcl/control2.ctl'
LOG_ARCHIVE_DEST_1='LOCATION=/arch1/chicago/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=orcl'
LOG_ARCHIVE_DEST_2='SERVICE=orclbak ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=orclbak'
LOG_ARCHIVE_DEST_STATE_1=ENABLE
LOG_ARCHIVE_DEST_STATE_2=ENABLE
REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE
FAL_SERVER=orclbak
DB_FILE_NAME_CONVERT='orclbak','orcl'
LOG_FILE_NAME_CONVERT='/arch1/orclbak/','/arch1/orcl/','/arch2/orclbak/','/arch2/orcl/'
LOG_ARCHIVE_FORMAT=log%t_%s_%r.arc
STANDBY_FILE_MANAGEMENT=AUTO

physical standby:
DB_NAME：orcl
DB_UNIQUE_NAME：orclbak
Oracle Net Service Name：orclbak
LOG_ARCHIVE_CONFIG='DG_CONFIG=(orclbak,orcl)'
CONTROL_FILES='/arch1/orclbak/control1.ctl', '/arch2/orclbak/control2.ctl'
DB_FILE_NAME_CONVERT='orcl','orclbak'
LOG_FILE_NAME_CONVERT='/arch1/orcl/','/arch1/orclbak/','/arch2/orcl/','/arch2/orclbak/'
LOG_ARCHIVE_FORMAT=log%t_%s_%r.arc
LOG_ARCHIVE_DEST_1='LOCATION=/arch1/orclbak/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=orclbak' 
LOG_ARCHIVE_DEST_2='SERVICE=chicago ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=orcl'
LOG_ARCHIVE_DEST_STATE_1=ENABLE
LOG_ARCHIVE_DEST_STATE_2=ENABLE
REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE
STANDBY_FILE_MANAGEMENT=AUTO
FAL_SERVER=orcl

1.2.检查standby log file

standby log file的数量以及查看添加的命令如下：

standby redo log (SRL)与 online redo log(ORL) 日志组大小一致
SRL = (maximum number of logfiles for each thread + 1) * maximum number of threads
--standby redo log
set lines 200 pages 9999 LONG 5000
col member for a80
select a.thread#,a.group#,b.member,b.type,a.bytes/1024/1024 MB from v$standby_log a,v$logfile b where a.group#=b.group#; 
--redo log
set lines 200 pages 9999 LONG 5000
col member for a80
select a.thread#,a.group#,b.member,b.type,a.bytes/1024/1024 MB from v$log a,v$logfile b where a.group#=b.group#
--修改stadnby logfile
alter database recover managed standby database cancel;
alter system set standby_file_management='manual';
alter database drop logfile group 5;
alter database add standby logfile thread 1 group 20 '+SRL' size 1024M;
alter database add standby logfile thread 2 group 21 '+SRL' size 1024M;
alter database recover managed standby database using current logfile disconnect;
alter system set standby_file_management='AUTO';

1.3.检查同步状态

--查看进程状态
SELECT PROCESS FROM V$MANAGED_STANDBY WHERE PROCESS LIKE 'MRP%';
--查看是否实时应用
SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID=2;
-- LOG_ARCHIVE_MAX_PROCESSES
show parameter  LOG_ARCHIVE_MAX_PROCESSES
--Make sure LOG_ARCHIVE_MAX_PROCESSES is set to 4 or higher on every primary and standby database in the Data Guard configuration.

--Ensure Online Redo Log Files on the Target Physical Standby have been cleared
SELECT DISTINCT L.GROUP# FROM V$LOG L, V$LOGFILE LF
WHERE L.GROUP# = LF.GROUP#
AND L.STATUS NOT IN ('UNUSED','CLEARING','CLEARING_CURRENT');
ALTER DATABASE CLEAR LOGFILE GROUP <ORL GROUP# from the query above>;
ALTER DATABASE CLEAR LOGFILE GROUP 1;

--Verify Primary and Standby tempfiles match and all datafiles are ONLINE
SELECT TMP.NAME FILENAME, BYTES, TS.NAME TABLESPACE
FROM V$TEMPFILE TMP, V$TABLESPACE TS WHERE TMP.TS#=TS.TS#;
--确保所需数据文件都是online
SELECT NAME FROM V$DATAFILE WHERE STATUS=’OFFLINE’;

2.切换 switchover

2.1.首先创建一个回退点用于切换失败回退

--Cancel any apply delay for the target standby
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE NODELAY DISCONNECT FROM SESSION;

--Clear Potential Blocking Parameters & Jobs
SELECT * FROM DBA_JOBS_RUNNING;
SELECT OWNER, JOB_NAME, START_DATE, END_DATE, ENABLED FROM DBA_SCHEDULER_JOBS WHERE ENABLED=’TRUE’ AND OWNER <> ‘SYS”;
EXECUTE DBMS_SCHEDULER.DISABLE(<for each job name captured>);


--SHOW PARAMETER job_queue_processes
ALTER SYSTEM SET job_queue_processes=0 SCOPE=BOTH SID=’*’;

--Turn on Data Guard tracing on primary and standby
SHOW PARAMETER log_archive_trace
ALTER SYSTEM SET log_archive_trace=8191;

--Tail Alert Logs (optional) on all instances
SHOW PARAMETER background_dump_dest
tail –f <background_dump_dest location>/alert*

--Create Guaranteed Restore Points (optional)
On the standby
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE

On the primary
CREATE RESTORE POINT SWITCHOVER_START_GRP GUARANTEE FLASHBACK DATABASE;

2.2.把主库切换为备库 Switchover the primary to a standby database

--Verify that the primary database can be switched to the standby role
SELECT SWITCHOVER_STATUS FROM V$DATABASE;
结果应该是： TO STANDBY or SESSIONS ACTIVE
--If The Primary is a RAC, then shutdown all secondary primary instances
shutdown abort;

--Switchover the primary to a standby database
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
If an ORA-16139 error is encountered, as long as $DATABASE.DATABASE_ROLE=’PHYSICAL STANDBY’, then you can proceed.

--Verify the standby has received the end-of-redo (EOR) log(s)（观察alert日志）

2.3.把备库切换为主库

--Verify that the standby database can be switched to the primary role
SELECT SWITCHOVER_STATUS FROM V$DATABASE;
--TO PRIMARY

--Switchover the standby database to a primary
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

--Open the new primary database
ALTER DATABASE OPEN;

--Correct any tempfile mismatch

2.4.打开新备库

原主库切换为备库之后应该关闭状态，如果是open状态，请重启新备库。

If the new standby database (former primary database) was not shutdown since switching it to standby, 
bring it to the mount state and start managed recovery. This can be done in parallel to the new primary
open.

SHUTDOWN ABORT;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

2.5.重置

我们在切换前做的设置重置回去。

ALTER SYSTEM SET log_archive_trace=<prior value>;
ALTER SYSTEM SET job_queue_processes=<value saved> scope=both sid=’*
Enable any jobs that were disabled.
EXECUTE DBMS_SCHEDULER.ENABLE(<for each job name captured>);
DROP RESTORE POINT SWITCHOVER_START_GRP;’

3.回退方案

在切换中有问题可以回退到我们创建的回退点。
如果切换不成功，想重新执行切换，可以参考官方文档。Roll Back After Unsuccessful Switchover and Start Over

1.Shut down and mount the new standby database (old primary).
2.Start Redo Apply on the new standby database.
3.Verify that the new standby database is ready to be switched back to the primary role.
SELECT SWITCHOVER_STATUS FROM V$DATABASE;

4.Issue the following statement to convert the new standby database back to the primary role:

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

If this statement is successful, the database will be running in the primary database role, and you do not need to perform any more steps.
If this statement is unsuccessful, then continue with Step 5.

5.When the switchover to change the role from primary to physical standby was initiated, a trace file was written in the log directory. 
This trace file contains the SQL statements required to re-create the original primary control file. 
Locate the trace file and extract the SQL statements into a temporary file. 
Execute the temporary file from SQL*Plus. 
This will revert the new standby database back to the primary role.

6.Shut down the original physical standby database.

7.Create a new standby control file.
 This is necessary to resynchronize the primary database and physical standby database. 
 Copy the physical standby control file to the original physical standby system.

8.Restart the original physical standby instance.

If this procedure is successful and archive gap management is enabled, 
the FAL processes will start and re-archive any missing archived redo log files to the physical standby database. 
Force a log switch on the primary database and examine the alert logs on both the primary database and physical standby 
database to ensure the archived redo log file sequence numbers are correct.

9.Try the switchover again.

At this point, the Data Guard configuration has been rolled back to its initial state, 
and you can try the switchover operation again 
(after correcting any problems that might have led to the initial unsuccessful switchover).

墨力计划 oracle

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

oracle高可用（共5篇）

oracle高可用