
上一篇文章我们配置好了FSFO,本文将分别在failover、switchover和压测场景下通过日志去观察FSFO的切换过程。
failover场景
通过对主库进行shutdown abort触发DG Broker的fast-start failover,观察FSFO过程中主库、备库、观察者和客户端的日志。
1) 查看主备库及DGB信息
确认主备库处于同步状态并且DGB中开启了FSFO。
主库:
sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;
DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE FS_FAILOVER_STATUS FS_FAILOVER_CURRENT_TARGET
---------------------------------------
TEST_PRI PRIMARY READ WRITE SYNCHRONIZED test_std
Elapsed: 00:00:00.01
备库:
sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;
DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE FS_FAILOVER_STATUS FS_FAILOVER_CURRENT_TARGET
----------------------------------------
test_std PHYSICAL STANDBY READ ONLY WITH APPLY SYNCHRONIZED test_std
Elapsed: 00:00:00.01
确认DGB 状态正常:
DGMGRL> show configuration verbose;
Configuration - test
Protection Mode: MaxAvailability
Databases:
TEST_PRI - Primary database
test_std - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '15'
OperationTimeout = '300'
……
……
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: ENABLED
Threshold: 15 seconds
Target: test_std
Observer: 10-110-110-30
Lag Limit: 30 seconds (not in use)
……
Configuration Status:
SUCCESS
2) shutdown abort关闭主库,触发failover
sys@test> shutdown abort
ORACLE instance shut down.
3) 观察者输出内容
DGMGRL> start observer;
Observer started
11:44:11.03 Monday, January 09, 2017
Initiating Fast-Start Failover to database "test_std"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "test_std" --切换成功
11:44:26.86 Monday, January 09, 2017
4) TAF 客户端无缝切换日志
在30上以fsfo连接串连接到主库:
[oracle]$ sqlplus sys/123456@fsfo as sysdba
sys@TEST>select database_role,db_unique_name from v$database;
DATABASE_ROLE DB_UNIQUE_NAME
---------------- ------------------------------
PRIMARY TEST_PRI
切换前:DB_UNIQUE_NAME为TEST_PRI
在当前会话中执行:
sys@TEST>select * from dba_objects; --执行耗时sql
LAST_DDL_ TIMESTAMP STATUS T G S
--------- ------------------- ------- - - -
NAMESPACE EDITION_NAME
---------- ------------------------------
24-AUG-13 2013-08-24:11:37:35 VALID N N N
1
24-AUG-13 2013-08-24:11:37:35 VALID N N N
4
…………
…………
切换期间,屏幕输出停顿一定时间,然后继续打印输出:
…………
12-JAN-17 2017-01-12:08:24:46 VALID N N N
4
16-JAN-17 2017-01-16:11:49:37 VALID N N N
1
86282 rows selected.
最终返回所有结果集。
再查看当前会话:
sys@TEST>select database_role,db_unique_name from v$database;
DATABASE_ROLE DB_UNIQUE_NAME
-------------------------------------
PRIMARY test_std
当前连接已经转到test_std上。
5) 查看test_std状态和DGB信息
查看test_std状态:
sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;
DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE FS_FAILOVER_STATUS FS_FAILOVER_CURRENT_TARGET
--------------------------------------
test_std PRIMARY READ WRITE REINSTATE REQUIRED TEST_PRI
Elapsed: 00:00:00.00
查看DGB状态:
DGMGRL> show configuration verbose;
Configuration - test
Protection Mode: MaxAvailability
Databases:
test_std - Primary database
Warning: ORA-16817: unsynchronized fast-start failover configuration
TEST_PRI - (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated--需要恢复
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '15'
OperationTimeout = '300'
……
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: ENABLED
Threshold: 15 seconds
Target: TEST_PRI
Observer: 10-110-110-30
……
Configuration Status:
WARNING
6) 启动test_pri到mount触发恢复操作
-bash-4.2$ sqlplus as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 9 11:50:13 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
idle> startup mount;
ORACLE instance started.
……
Database mounted.
查看test_pri状态:
idle> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;
select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 26511
Session ID: 5 Serial number: 3
7) 观察者恢复test_pri
查看观察者输出日志:
11:51:02.37 Monday, January 09, 2017
Initiating reinstatement for database "TEST_PRI"...
Reinstating database "TEST_PRI", please wait...
Operation requires shutdown of instance "test" on database "TEST_PRI"
Shutting down instance "test"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "test" on database "TEST_PRI"
Starting instance "test"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "TEST_PRI" ...
Reinstatement of database "TEST_PRI" succeeded--完成恢复
11:52:14.38 Monday, January 09, 2017
恢复完成后再查看test_pri
sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;
DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE FS_FAILOVER_STATUS FS_FAILOVER_CURRENT_TARGET
----------------------------------------
TEST_PRI PHYSICAL STANDBY READ ONLY WITH APPLY SYNCHRONIZED TEST_PRI
Elapsed: 00:00:00.01
sys@test>
查看DGB状态:
DGMGRL> show configuration verbose;
Configuration - test
Protection Mode: MaxAvailability
Databases:
test_std - Primary database
TEST_PRI - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '15'
OperationTimeout = '300'
……
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: ENABLED
Threshold: 15 seconds
Target: TEST_PRI
……
Configuration Status:
SUCCESS
8) 切换过程中test_pri的alert 日志:
[BEGIN] 2017/1/9 11:42:08
Mon Jan 09 11:43:54 2017
Shutting down instance (abort) --强制关库
License high water mark = 12
USER (ospid: 25916): terminating the instance
Instance terminated by USER, pid = 25916
Mon Jan 09 11:43:55 2017
Instance shutdown complete
以下为test_pri 启动到mount状态触发的日志
Mon Jan 09 11:50:23 2017
Starting ORACLE instance (normal)
……
……
Completed: ALTER DATABASE MOUNT
Mon Jan 09 11:50:44 2017
……
FLASHBACK DATABASE TO SCN 7902949
Flashback Restore Start
……
Flashback Restore Complete
Flashback Media Recovery Start
started logmerger process
Parallel Media Recovery started with 4 slaves
Mon Jan 09 11:51:13 2017
Recovery of Online Redo Log: Thread 1 Group 3 Seq 299 Reading mem 0
Mem# 0: home/oracle/app/oradata/test/redo103.log
Incomplete Recovery applied until change 7902950 time 01/09/2017 11:43:54
Flashback Media Recovery Complete
Completed: FLASHBACK DATABASE TO SCN 7902949 --恢复到切换前的scn
alter database convert to physical standby
ALTER DATABASE CONVERT TO PHYSICAL STANDBY (test)--转换为备库
Flush standby redo logfile failed:1649
Clearing standby activation ID 2410103509 (0x8fa742d5)
……
Completed: alter database convert to physical standby
Mon Jan 09 11:51:18 2017
Shutting down instance (immediate)
……
Instance shutdown complete
Mon Jan 09 11:51:30 2017
Starting ORACLE instance (normal)
……
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
……
Media Recovery Log home/oracle/arch/test_1_6_932816657.arc
Media Recovery Waiting for thread 1 sequence 7 (in transit)
Recovery of Online Redo Log: Thread 1 Group 11 Seq 7 Reading mem 0
Mem# 0: home/oracle/app/oradata/test/stdbyredo11.log
[END] 2017/1/9 11:58:18
9) 切换过程中test_std的alert日志
[BEGIN] 2017/1/9 11:43:14
Mon Jan 09 11:43:54 2017
RFS[7]: Assigned to RFS process 13988
RFS[7]: Possible network disconnect with primary database --监测到丢失连接
Mon Jan 09 11:43:54 2017
RFS[8]: Assigned to RFS process 13881
RFS[8]: Possible network disconnect with primary database
……
Mon Jan 09 11:43:54 2017
RFS[3]: Possible network disconnect with primary database
Mon Jan 09 11:44:10 2017
Attempting Fast-Start Failover because the threshold of 15 seconds has elapsed.--超时
Mon Jan 09 11:44:11 2017
Data Guard Broker: Beginning failover--开始切换
Mon Jan 09 11:44:11 2017
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
Mon Jan 09 11:44:11 2017
MRP0: Background Media Recovery cancelled with status 16037
Errors in file home/oracle/app/diag/rdbms/test_std/test/trace/test_pr00_9424.trc:
……
Managed Standby Recovery Canceled (test)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
Attempt to do a Terminal Recovery (test)
Media Recovery Start: Managed Standby Recovery (test)
started logmerger process
Mon Jan 09 11:44:12 2017
……
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
ALTER DATABASE SWITCHOVER TO PRIMARY (test)
Maximum wait for role transition is 15 minutes.
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
……
Mon Jan 09 11:44:17 2017
Setting recovery target incarnation to 4
Switchover: Complete - Database mounted as primary
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
Completed: ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
ALTER DATABASE OPEN
Data Guard Broker initializing...
Mon Jan 09 11:44:18 2017
……
Failover succeeded. Primary database is now test_std.--切换完成
……
ALTER SYSTEM ARCHIVE LOG
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
LGWR: Standby redo logfile selected to archive thread 1 sequence 7
LGWR: Standby redo logfile selected for thread 1 sequence 7 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 7 (LGWR switch)
Current log# 1 seq# 7 mem# 0: home/oracle/app/oradata/test/redo101.log
Archived Log entry 551 added for thread 1 sequence 6 ID 0x90370840 dest 1:
Switchover切换
开启配置FSFO的情况下,只可以SWITCHOVER到目标主库。执行切换前,确认工作请参考DG Broker文档。测试过程如下:
1)先查看DGB FSFO状态
DGMGRL> show configuration verbose;
Configuration - test
Protection Mode: MaxAvailability
Databases:
test_std - Primary database
TEST_PRI - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '15'
OperationTimeout = '300'
……
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: ENABLED
Threshold: 15 seconds
Target: TEST_PRI
Observer: 10-110-110-30
……
Configuration Status:
SUCCESS
2)使用DGB 发起切换命令
DGMGRL> SWITCHOVER TO 'TEST_PRI';
Performing switchover NOW, please wait...
Operation requires a connection to instance "test" on database "TEST_PRI"
Connecting to instance "test"...
Connected.
New primary database "TEST_PRI" is opening...
Operation requires startup of instance "test" on database "test_std"
Starting instance "test"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "TEST_PRI" --切换完成
DGMGRL>
DGMGRL>
3)确认DGB FSFO状态
DGMGRL> show configuration verbose;
Configuration - test
Protection Mode: MaxAvailability
Databases:
TEST_PRI - Primary database --完成切换
test_std - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '15'
OperationTimeout = '300'
……
PrimaryLostWriteAction = 'CONTINUE'
Fast-Start Failover: ENABLED
Threshold: 15 seconds
Target: test_std
Observer: 10-110-110-30
……
Configuration Status:
SUCCESS
压测场景failover
1) JDBC中连接串的配置
jdbc-1.druid.url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oracle.data1.com)(PORT=9539))(ADDRESS=(PROTOCOL=tcp)(HOST=oracle.data2.com)(PORT=9539))(LOAD_BALANCE=off)(FAILOVER=on)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=SERVICE_PRI)(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=180)(DELAY=10))))
2)压测环境
压测库:oracle.data1.com
压测库备库:oracle.data2.com
场景:某业务下单接口,100并发
3)shutdown abort 主库切换
整个FSFO过程和测试机相同。不同之处在于压测环境FAILOVER_MODE中TYPE=session,测试机环境中FAILOVER_MODE中TYPE=select。设置为session会导致应用程序连接断开出现如下报错:
### Error querying database. Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据
### The error may exist in sqlmap/LotteryPocketMapper.xml
### The error may involve com.netease.lottery.pocket.dal.db.dao.LotteryPocketDao.selectPocket-Inline
### The error occurred while setting parameters
### SQL: select * from TB_LOTTERY_POCKET where ……
### Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据
; SQL []; 无法从套接字读取更多的数据; nested exception is java.sql.SQLRecoverableException: 无法从套接字读取更多的数据
org.springframework.dao.RecoverableDataAccessException:
### Error querying database. Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据
这下我们终于可以不用半夜起来处理主库宕机的问题了,不过DG Broker的管理还有一些应该注意的问题,我们下文讲解。
网易MySQL开源中间件Cetus
__________________________
github地址
https://github.com/Lede-Inc/cetus/blob/master/doc/cetus-quick-try.md
欢迎加star关注
社群
技术专家在线及时反馈
cetus开源qq群号: 521824702
cetus开源微信群:扫描网易DBA小助手加入

往期精彩文章
__________________________

网易乐得DBA组负责网易乐得电商、网易邮箱、网易技术部数据库日常运维,负责数据库私有云平台的开发和维护,负责数据库及数据库中间件的开发和测试等,分享最前沿实用数据库干货,关注网易乐得DBA,精修数据库功底。

关注「网易乐得DBA」
了解前沿数据库技术




