暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

网易DG Broker系列:FSFO切换篇

DBA天团 2021-02-05
1136


上一篇文章我们配置好了FSFO,本文将分别在failover、switchover和压测场景下通过日志去观察FSFO的切换过程。


1

failover场景


通过对主库进行shutdown abort触发DG Broker的fast-start failover,观察FSFO过程中主库、备库、观察者和客户端的日志。


1)       查看主备库及DGB信息

确认主备库处于同步状态并且DGB中开启了FSFO。

主库

sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;


DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            FS_FAILOVER_STATUS     FS_FAILOVER_CURRENT_TARGET

---------------------------------------

TEST_PRI                     PRIMARY          READ WRITE           SYNCHRONIZED           test_std


Elapsed: 00:00:00.01


备库

sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;


DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            FS_FAILOVER_STATUS     FS_FAILOVER_CURRENT_TARGET

----------------------------------------

test_std                     PHYSICAL STANDBY READ ONLY WITH APPLY SYNCHRONIZED           test_std


Elapsed: 00:00:00.01


确认DGB 状态正常:


DGMGRL> show configuration verbose;


Configuration - test


  Protection Mode: MaxAvailability

  Databases:

    TEST_PRI - Primary database

    test_std - (*) Physical standby database


  (*) Fast-Start Failover target


  Properties:

    FastStartFailoverThreshold      = '15'

    OperationTimeout                = '300'

……

……

    PrimaryLostWriteAction          = 'CONTINUE'


Fast-Start Failover: ENABLED


  Threshold:          15 seconds

  Target:             test_std

  Observer:           10-110-110-30

  Lag Limit:          30 seconds (not in use)

……


Configuration Status:

SUCCESS

2)       shutdown abort关闭主库,触发failover

sys@test> shutdown abort

ORACLE instance shut down.

3)       观察者输出内容

DGMGRL> start observer;

Observer started

11:44:11.03  Monday, January 09, 2017

Initiating Fast-Start Failover to database "test_std"...

Performing failover NOW, please wait...

Failover succeeded, new primary is "test_std"    --切换成功

11:44:26.86  Monday, January 09, 2017


4)       TAF 客户端无缝切换日志

30上以fsfo连接串连接到主库:

[oracle]$ sqlplus sys/123456@fsfo as sysdba

sys@TEST>select database_role,db_unique_name from v$database;

DATABASE_ROLE      DB_UNIQUE_NAME

---------------- ------------------------------

PRIMARY          TEST_PRI


切换前:DB_UNIQUE_NAMETEST_PRI


在当前会话中执行:

sys@TEST>select * from dba_objects;   --执行耗时sql

 LAST_DDL_ TIMESTAMP           STATUS  T G S

 --------- ------------------- ------- - - -

 NAMESPACE EDITION_NAME

---------- ------------------------------

 24-AUG-13 2013-08-24:11:37:35 VALID   N N N

          1

 24-AUG-13 2013-08-24:11:37:35 VALID   N N N

          4

…………

…………

切换期间,屏幕输出停顿一定时间,然后继续打印输出:

…………

12-JAN-17 2017-01-12:08:24:46 VALID   N N N

          4

 16-JAN-17 2017-01-16:11:49:37 VALID   N N N

          1

86282 rows selected.

最终返回所有结果集。


再查看当前会话:

sys@TEST>select database_role,db_unique_name from v$database;

DATABASE_ROLE      DB_UNIQUE_NAME

-------------------------------------

PRIMARY          test_std

当前连接已经转到test_std上。


5)       查看test_std状态和DGB信息

查看test_std状态:

sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;

DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            FS_FAILOVER_STATUS     FS_FAILOVER_CURRENT_TARGET

--------------------------------------

test_std                     PRIMARY          READ WRITE           REINSTATE REQUIRED     TEST_PRI

Elapsed: 00:00:00.00


查看DGB状态:


DGMGRL> show configuration verbose;


Configuration - test


  Protection Mode: MaxAvailability

  Databases:

  test_std - Primary database

      Warning: ORA-16817: unsynchronized fast-start failover configuration


   TEST_PRI - (*) Physical standby database (disabled)

      ORA-16661: the standby database needs to be reinstated--需要恢复


  (*) Fast-Start Failover target


  Properties:

    FastStartFailoverThreshold      = '15'

    OperationTimeout                = '300'

   ……

    PrimaryLostWriteAction          = 'CONTINUE'


Fast-Start Failover: ENABLED


  Threshold:          15 seconds

  Target:             TEST_PRI

  Observer:            10-110-110-30

  ……


Configuration Status:

WARNING


6)       启动test_primount触发恢复操作

-bash-4.2$ sqlplus as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 9 11:50:13 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

idle> startup mount;

ORACLE instance started.

……

Database mounted.


查看test_pri状态:


idle> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;

select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database

*

ERROR at line 1:

ORA-03135: connection lost contact

Process ID: 26511

Session ID: 5 Serial number: 3


7)       观察者恢复test_pri

查看观察者输出日志:

11:51:02.37  Monday, January 09, 2017

Initiating reinstatement for database "TEST_PRI"...

Reinstating database "TEST_PRI", please wait...

Operation requires shutdown of instance "test" on database "TEST_PRI"

Shutting down instance "test"...

ORA-01109: database not open


Database dismounted.

ORACLE instance shut down.

Operation requires startup of instance "test" on database "TEST_PRI"

Starting instance "test"...

ORACLE instance started.

Database mounted.

Continuing to reinstate database "TEST_PRI" ...

Reinstatement of database "TEST_PRI" succeeded--完成恢复

11:52:14.38  Monday, January 09, 2017


恢复完成后再查看test_pri

sys@test> select DB_UNIQUE_NAME,database_role,open_mode,FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET from v$database;


DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            FS_FAILOVER_STATUS     FS_FAILOVER_CURRENT_TARGET

----------------------------------------

TEST_PRI                     PHYSICAL STANDBY READ ONLY WITH APPLY SYNCHRONIZED           TEST_PRI


Elapsed: 00:00:00.01

sys@test>


查看DGB状态

DGMGRL> show configuration verbose;


Configuration - test


  Protection Mode: MaxAvailability

  Databases:

test_std - Primary database

TEST_PRI - (*) Physical standby database


  (*) Fast-Start Failover target


  Properties:

    FastStartFailoverThreshold      = '15'

    OperationTimeout                = '300'

    ……

    PrimaryLostWriteAction          = 'CONTINUE'


Fast-Start Failover: ENABLED


  Threshold:          15 seconds

  Target:             TEST_PRI

 ……


Configuration Status:

SUCCESS


8)       切换过程中test_prialert 日志:

[BEGIN] 2017/1/9 11:42:08

Mon Jan 09 11:43:54 2017

Shutting down instance (abort) --强制关库

License high water mark = 12

USER (ospid: 25916): terminating the instance

Instance terminated by USER, pid = 25916

Mon Jan 09 11:43:55 2017

Instance shutdown complete



以下为test_pri 启动到mount状态触发的日志

Mon Jan 09 11:50:23 2017

Starting ORACLE instance (normal)

……

……

Completed: ALTER DATABASE   MOUNT

Mon Jan 09 11:50:44 2017

……

FLASHBACK DATABASE TO SCN 7902949

Flashback Restore Start

……

Flashback Restore Complete

Flashback Media Recovery Start

 started logmerger process

Parallel Media Recovery started with 4 slaves

Mon Jan 09 11:51:13 2017

Recovery of Online Redo Log: Thread 1 Group 3 Seq 299 Reading mem 0

  Mem# 0: home/oracle/app/oradata/test/redo103.log

Incomplete Recovery applied until change 7902950 time 01/09/2017 11:43:54

Flashback Media Recovery Complete

Completed: FLASHBACK DATABASE TO SCN 7902949 --恢复到切换前的scn

alter database convert to physical standby

ALTER DATABASE CONVERT TO PHYSICAL STANDBY (test)--转换为备库

Flush standby redo logfile failed:1649

Clearing standby activation ID 2410103509 (0x8fa742d5)

……

Completed: alter database convert to physical standby

Mon Jan 09 11:51:18 2017

Shutting down instance (immediate)

……

Instance shutdown complete

Mon Jan 09 11:51:30 2017

Starting ORACLE instance (normal)

……

Primary database is in MAXIMUM AVAILABILITY mode

Standby controlfile consistent with primary

……

Media Recovery Log home/oracle/arch/test_1_6_932816657.arc

Media Recovery Waiting for thread 1 sequence 7 (in transit)

Recovery of Online Redo Log: Thread 1 Group 11 Seq 7 Reading mem 0

  Mem# 0: home/oracle/app/oradata/test/stdbyredo11.log


[END] 2017/1/9 11:58:18


9)       切换过程中test_stdalert日志

[BEGIN] 2017/1/9 11:43:14

Mon Jan 09 11:43:54 2017

RFS[7]: Assigned to RFS process 13988

RFS[7]: Possible network disconnect with primary database    --监测到丢失连接

Mon Jan 09 11:43:54 2017

RFS[8]: Assigned to RFS process 13881

RFS[8]: Possible network disconnect with primary database

……

Mon Jan 09 11:43:54 2017

RFS[3]: Possible network disconnect with primary database

Mon Jan 09 11:44:10 2017

Attempting Fast-Start Failover because the threshold of 15 seconds has elapsed.--超时

Mon Jan 09 11:44:11 2017

Data Guard Broker: Beginning failover--开始切换

Mon Jan 09 11:44:11 2017

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL

Mon Jan 09 11:44:11 2017

MRP0: Background Media Recovery cancelled with status 16037

Errors in file home/oracle/app/diag/rdbms/test_std/test/trace/test_pr00_9424.trc:

……

Managed Standby Recovery Canceled (test)

Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE

Attempt to do a Terminal Recovery (test)

Media Recovery Start: Managed Standby Recovery (test)

 started logmerger process

Mon Jan 09 11:44:12 2017

……

Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN

ALTER DATABASE SWITCHOVER TO PRIMARY (test)

Maximum wait for role transition is 15 minutes.

All dispatchers and shared servers shutdown

CLOSE: killing server sessions.

……

Mon Jan 09 11:44:17 2017

Setting recovery target incarnation to 4

Switchover: Complete - Database mounted as primary

Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN

ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY

Completed: ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY

ALTER DATABASE OPEN

Data Guard Broker initializing...

Mon Jan 09 11:44:18 2017

……

Failover succeeded. Primary database is now test_std.--切换完成

……

ALTER SYSTEM ARCHIVE LOG

******************************************************************

LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2

******************************************************************

LGWR: Standby redo logfile selected to archive thread 1 sequence 7

LGWR: Standby redo logfile selected for thread 1 sequence 7 for destination LOG_ARCHIVE_DEST_2

Thread 1 advanced to log sequence 7 (LGWR switch)

  Current log# 1 seq# 7 mem# 0: home/oracle/app/oradata/test/redo101.log

Archived Log entry 551 added for thread 1 sequence 6 ID 0x90370840 dest 1:


2

Switchover切换


开启配置FSFO的情况下,只可以SWITCHOVER到目标主库。执行切换前,确认工作请参考DG Broker文档。测试过程如下:

1)先查看DGB FSFO状态

DGMGRL> show configuration verbose;


Configuration - test


  Protection Mode: MaxAvailability

  Databases:

 test_std - Primary database

    TEST_PRI - (*) Physical standby database


  (*) Fast-Start Failover target


  Properties:

    FastStartFailoverThreshold      = '15'

    OperationTimeout                = '300'

……

    PrimaryLostWriteAction          = 'CONTINUE'


Fast-Start Failover: ENABLED


  Threshold:          15 seconds

  Target:             TEST_PRI

  Observer:            10-110-110-30

……


Configuration Status:

SUCCESS


2)使用DGB 发起切换命令


DGMGRL> SWITCHOVER TO 'TEST_PRI';

Performing switchover NOW, please wait...

Operation requires a connection to instance "test" on database "TEST_PRI"

Connecting to instance "test"...

Connected.

New primary database "TEST_PRI" is opening...

Operation requires startup of instance "test" on database "test_std"

Starting instance "test"...

ORACLE instance started.

Database mounted.

Database opened.

Switchover succeeded, new primary is "TEST_PRI"       --切换完成

DGMGRL>

DGMGRL>


3)确认DGB FSFO状态

DGMGRL> show configuration verbose;


Configuration - test


  Protection Mode: MaxAvailability

  Databases:

TEST_PRI - Primary database      --完成切换

    test_std - (*) Physical standby database   


  (*) Fast-Start Failover target


  Properties:

    FastStartFailoverThreshold      = '15'

    OperationTimeout                = '300'

    ……

    PrimaryLostWriteAction          = 'CONTINUE'


Fast-Start Failover: ENABLED


  Threshold:          15 seconds

  Target:             test_std

  Observer:            10-110-110-30

 ……


Configuration Status:

SUCCESS

3

压测场景failover


1) JDBC中连接串的配置

jdbc-1.druid.url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oracle.data1.com)(PORT=9539))(ADDRESS=(PROTOCOL=tcp)(HOST=oracle.data2.com)(PORT=9539))(LOAD_BALANCE=off)(FAILOVER=on)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=SERVICE_PRI)(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=180)(DELAY=10))))

2)压测环境

压测库:oracle.data1.com 

压测库备库:oracle.data2.com

场景:某业务下单接口,100并发

3)shutdown abort 主库切换

整个FSFO过程和测试机相同。不同之处在于压测环境FAILOVER_MODETYPE=session,测试机环境中FAILOVER_MODETYPE=select。设置为session会导致应用程序连接断开出现如下报错:


### Error querying database.  Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据

### The error may exist in sqlmap/LotteryPocketMapper.xml

### The error may involve com.netease.lottery.pocket.dal.db.dao.LotteryPocketDao.selectPocket-Inline

### The error occurred while setting parameters

### SQL: select    *   from TB_LOTTERY_POCKET     where ……

### Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据

; SQL []; 无法从套接字读取更多的数据; nested exception is java.sql.SQLRecoverableException: 无法从套接字读取更多的数据

org.springframework.dao.RecoverableDataAccessException:

### Error querying database.  Cause: java.sql.SQLRecoverableException: 无法从套接字读取更多的数据


这下我们终于可以不用半夜起来处理主库宕机的问题了,不过DG Broker的管理还有一些应该注意的问题,我们下文讲解。


网易MySQL开源中间件Cetus

__________________________

github地址

https://github.com/Lede-Inc/cetus/blob/master/doc/cetus-quick-try.md

欢迎加star关注

社群

技术专家在线及时反馈

cetus开源qq群号: 521824702

cetus开源微信群:扫描网易DBA小助手加入


往期精彩文章

__________________________

网易中间件Cetus开源啦

网易开源中间件 -Cetus监控模块

网易分片中间件cetus扩容方案

数据库导入导出基础扫盲

网易DG Broker系列:切换自如

网易乐得RDS开发:实时监控mysql

网易DBA女神揭秘区块链天机

网易北京研发中心DBA招募令

网易乐得RDS设计—任务调度篇


欢迎分享


网易乐得DBA组负责网易乐得电商、网易邮箱、网易技术部数据库日常运维,负责数据库私有云平台的开发和维护,负责数据库及数据库中间件的开发和测试等,分享最前沿实用数据库干货,关注网易乐得DBA,精修数据库功底。

关注「网易乐得DBA

了解前沿数据库技术


文章转载自DBA天团,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论