1.配置 dg_broker
https://blog.csdn.net/qq_39466006/article/details/132018521
架构:以一主三备为例
| IP | hostname | db_unique_name | DG Broker role | switchover role | failover role |
|---|---|---|---|---|---|
| 192.168.10.91 | primary | primary | 主库 | 从库 | 从库 |
| 192.168.10.92 | standby-01 | standby-01 | 从库 | 主库 | 从库 |
| 192.168.10.93 | standby-02 | standby-02 | 从库 | 从库 | 从库 |
| 192.168.10.94 | standby-03 | standby-03 | 从库 | 从库 | 从库 |
使用以下方法可以添加多个备库,具体配置的差异由你来确定
2.安装oracle数据库软件
3.备数据库配置
3.1.拷贝参数文件
scp /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileorcl.ora oracle@192.168.10.94:/u01/app/oracle/product/19.3.0/dbhome_1/dbs
3.2.拷贝密码文件
scp /u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapworcl oracle@192.168.10.94:/u01/app/oracle/product/19.3.0/dbhome_1/dbs
3.3.创建相应的文件目录
根据上面修改的参数文件,为备库创建相应的文件目录
mkdir -p /u01/app/oracle/archivelog
mkdir -p /u01/app/oracle/fast_recovery_area/orcl
mkdir -p /u01/app/oracle/admin/orcl/adump
mkdir -p /u01/app/oracle/oradata/orcl
chmod 775 /u01 -R
3.4.修改监听配置文件
# listener.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby-database)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = standby02)
(SID_NAME = orcl)
(ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
)
(SID_DESC =
(GLOBAL_DBNAME = standby02_DGMGRL)
(ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
(SID_NAME = orcl)
(ENVS="TNS_ADMIN=/u01/app/oracle/product/19.3.0/dbhome_1/network/admin")
)
)
ADR_BASE_LISTENER = /u01/app/oracle
3.5.修改TNS配置文件
# tnsnames.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.
LISTENER_ORCL =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby-database)(PORT = 1521))
STANDBY01 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby01)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = standby01)
(UR = A)
)
)
STANDBY02 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby02)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = standby02)
(UR = A)
)
)
STANDBY03 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standby03)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = standby03)
(UR = A)
)
)
PRIMARY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = primary)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = primary)
(UR = A)
)
)
创建认证配置文件
cat > /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora <<EOF
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.
NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)
ADR_BASE = /u01/app/oracle
EOF
3.6.修改 db_unique_name
alter system set db_unique_name='standby02' scope=spfile;
重新启动数据库到 nomount 状态,然后再启动监听服务
lsnrctl start
3.7.验证监听和TNS配置
主库:
sqlplus sys/admin123@primary as sysdba
sqlplus sys/admin123@standby02 as sysdba
备库:
sqlplus sys/admin123@primary as sysdba
sqlplus sys/admin123@standby02 as sysdba
注意:该步骤一定要在主备库上都能通过才能执行下面步骤
3.8.恢复数据库
# 连接数据库
rman target sys/admin123@primary auxiliary sys/admin123@standby02 nocatalog
# 备份数据
duplicate target database for standby from active database nofilenamecheck;
4.dg_broker 添加备库
step 1.查看 DG_BROKER_CONFIG_FILE文件
dg_broker_config_file1以及dg_broker_config_file2对应路径必须先存在(不存在需要先创建,否则启动dg_broker_start提示ORA-16604:
Data Guard broker configuration file inaccessible)
SQL> show parameter broker
SYS@orcl> show parameter broker
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
connection_brokers string ((TYPE=DEDICATED)(BROKERS=1)),
((TYPE=EMON)(BROKERS=1))
dg_broker_config_file1 string /u01/app/oracle/product/19.3.0
/dbhome_1/dbs/dr1orcl.dat
dg_broker_config_file2 string /u01/app/oracle/product/19.3.0
/dbhome_1/dbs/dr2orcl.dat
dg_broker_start boolean FALSE
use_dedicated_broker boolean FALSE
SYS@orcl>
可以用默认的路径,也可以自己指定(四个节点均需配置)。如果是RAC环境,把这个文件把到共享存储或者ASM中。
step 2.重置 LOG_ARCHIVE_DEST_n 信息
三个节点均需配置,这步开始,已有主从同步会中断。
ALTER SYSTEM set LOG_ARCHIVE_DEST_2='';
ALTER SYSTEM set LOG_ARCHIVE_DEST_3='';
使用 reset 需要重启。
FQ :
DGMGRL> add database 'standby03' as connect identifier is standby03;
Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set
Failed.
step 3.开启 dg_broker_start 为TRUE,默认是FALSE(主备库)
SQL> alter system set dg_broker_start=true sid='*';
step 4.增加备库:
DGMGRL> add database 'standby02' as connect identifier is standby02; Database "standby02" added DGMGRL>
查看配置:
DGMGRL> show configuration;
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
standby01 - Physical standby database
standby02 - Physical standby database (disabled)
ORA-16905: The member was not enabled yet.
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 20 seconds ago)
DGMGRL>
step 5.启用当前配置:
DGMGRL> enable Database standby02; Enabled. DGMGRL>
查看配置详细信息:
DGMGRL> show configuration;
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
standby01 - Physical standby database
standby02 - Physical standby database
Error: ORA-16810: multiple errors or warnings detected for the member
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 56 seconds ago)
DGMGRL>
查看配置详细信息:
DGMGRL> show configuration verbose;
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
standby01 - Physical standby database
standby02 - Physical standby database
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
ConfigurationWideServiceName = 'orcl_CFG'
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS
DGMGRL>
获取信息中包含任何告警都需要进行处理。
step 6.通过dgmgrl查看主备实例详细
前面使用单引号增加database,后面查询一律要使用。
备库信息:
DGMGRL> show database verbose standby02
Database - standby02
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 3.00 KByte/s
Active Apply Rate: 0 Byte/s
Maximum Apply Rate: 0 Byte/s
Real Time Query: OFF
Instance(s):
orcl
Properties:
DGConnectIdentifier = 'standby02'
ObserverConnectIdentifier = ''
FastStartFailoverTarget = ''
PreferredObserverHosts = ''
LogShipping = 'ON'
RedoRoutes = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyLagThreshold = '30'
TransportLagThreshold = '30'
TransportDisconnectedThreshold = '30'
ApplyParallel = 'AUTO'
ApplyInstances = '0'
StandbyFileManagement = ''
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '0'
LogArchiveMinSucceedDest = '0'
DataGuardSyncLatency = '0'
LogArchiveTrace = '0'
LogArchiveFormat = ''
DbFileNameConvert = ''
LogFileNameConvert = ''
ArchiveLocation = ''
AlternateLocation = ''
StandbyArchiveLocation = ''
StandbyAlternateLocation = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
LogXptStatus = '(monitor)'
SendQEntries = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'standby02'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby02_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
TopWaitEvents = '(monitor)'
SidName = '(monitor)'
Log file locations:
Alert log : /u01/app/oracle/diag/rdbms/standby02/orcl/trace/alert_orcl.log
Data Guard Broker log : /u01/app/oracle/diag/rdbms/standby02/orcl/trace/drcorcl.log
Database Status:
SUCCESS
DGMGRL>
当前备库状态是节点1实时应用日志,且备库同步都正常。
5.备库重新启用数据库闪回
验证是否开启闪回
SYS@orcl> select flashback_on from v$database;
FLASHBACK_ON
------------------------------------
NO
SYS@orcl> show parameter db_recovery_file_dest
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
db_recovery_file_dest string
db_recovery_file_dest_size big integer 0
SYS@orcl>
数据库已经开启flashback,那么下面步骤可忽略。
如上显示,该数据库未开启flashback,可按下面方法开启。
创建文件夹
[oracle@primary-database oracle]$ mkdir /u01/app/oracle/fast_recovery_area
SYS@orcl> alter system set db_recovery_file_dest_size='5G';
System altered.
SYS@orcl> alter system set db_recovery_file_dest='/u01/app/oracle/fast_recovery_area';
System altered.
SYS@orcl> show parameter db_recovery_file_dest
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
db_recovery_file_dest string /u01/app/oracle/fast_recovery_
area
db_recovery_file_dest_size big integer 5G
SYS@orcl>
开启闪回
SYS@orcl> alter database flashback on;
Database altered.
SYS@orcl> select flashback_on from v$database;
FLASHBACK_ON
------------------------------------
YES
SYS@orcl>
FQ 1:未配置闪回目录
SQL> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-38706: Cannot turn on FLASHBACK DATABASE logging.
ORA-38709: Recovery Area is not enabled.
SQL> show parameter db_
FQ 2:日志应用未停止
SYS@orcl> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active
SYS@orcl> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Database altered.
SYS@orcl>
6.Fast-Start Fail Over
fast-start failover(FSFO)功能是当primary数据库Crash后,它会自动地快速执行切换standby数据库为primary数据库操作。
那么什么时候会触发fast-start failover呢?
当数据库以正常模式(shutdown immediate/normal/transactional)关闭时,系统不会触发fast-start failover。 使用非正常关闭方式 shutdown abort 关闭primary时会触发fast-start failover。
这里引入一个新的进程observer,observer是用来监控primary数据库是否可用的,当主库不可用时,会执行快速切换standby数据库为primary数据库。
只有maximum availability mode or maximum performance mode才能启用fast-start failover模式。在maximum availability模式下面,在切换时可以保证无数据丢失,在maximum performance mode下面,会有数据丢失,丢失多少数据由FastStartFailoverLagLimit这个参数来配置。
step 1.查看配置属性
DGMGRL> show configuration verbose;
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
standby01 - Physical standby database
standby02 - Physical standby database
standby03 - Physical standby database
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
ConfigurationWideServiceName = 'orcl_CFG'
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS
DGMGRL>
修改数据库状态
DGMGRL> show database standby03
Database - standby03
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 351 seconds ago)
Apply Lag: 0 seconds (computed 351 seconds ago)
Average Apply Rate: 17.00 KByte/s
Real Time Query: OFF
Instance(s):
orcl
Database Warning(s):
ORA-16857: member disconnected from redo source for longer than specified threshold
Database Status:
WARNING
DGMGRL>
Intended State 有以下几种:
- TRANSPORT-ON
- TRANSPORT-OFF
- APPLY-ON
- APPLY-OFF
| 数据库角色 | 状态 | 描述 |
|---|---|---|
| Primary | TRANSPORT-ON | 设置重做传输服务,以便在打开主库以进行读/写访问时将重做数据传输到备库或远程同步实例。如果这是Oracle RAC数据库,则所有以读/写模式打开的实例都将运行重做传输服务。首次启用时,这是主库的默认状态。 |
| Primary | TRANSPORT-OFF | 重做传输服务在主库上停止。 如果这是Oracle RAC数据库,则重做传输服务不在任何实例上运行。 |
| Physical Standby | APPLY-ON | 备库启用Redo Apply。 如果备库是Oracle RAC数据库,则代理将仅在一个称为Apply实例的备库实例上启动Redo Apply。 如果此实例失败,则代理将自动选择另一个已安装或打开为只读的实例。 然后该新实例将成为Apply实例。 从Oracle Database 12c第2版(12.2.0.1)开始,可以将Redo Apply设置为在每个活动的运行中的物理备用实例上运行。 如果已经设置数据库在多个实例上运行重做应用,则可以使用DG Broker属性ApplyInstances来限制Oracle RAC物理备库上重做应用所涉及的实例数量。 |
| Physical Standby | APPLY-OFF | 备库停用Redo Apply 如果这是Oracle RAC数据库,则在将数据库状态更改为APPLY-ON之前,没有实例运行Apply Services。 |
| Logical Standby | APPLY-ON | 打开逻辑备库并打开逻辑备库保护后,将在逻辑备库上启动SQL Apply。 如果这是一个Oracle RAC数据库,则SQL Apply正在一个实例(apply实例)上运行。 如果此实例失败,则代理将自动选择另一个打开的实例。 这个新实例将成为apply实例。 首次启用时这是逻辑备库的默认状态。 |
| Logical Standby | APPLY-OFF | SQL Apply已停止。 逻辑备库保护功能已打开。 如果这是Oracle RAC数据库,则在将状态更改为APPLY-ON之前,不会运行SQL Apply的实例。 |
主库状态转换
当将主库转换为TRANSPORT-ON状态时,broker将使用配置成员的与重做传输相关的属性以及主库上的RedoRoutes属性来设置重做传输服务。
重做传输服务的设置是通过在主数据库上设置LOG_ARCHIVE_DEST_n和LOG_ARCHIVE_DEST_STATE_n初始化参数,以及在所有数据库(主库或备库)和远程同步实例上设置LOG_ARCHIVE_CONFIG初始化参数来完成的。
当将主库转换为TRANSPORT-OFF状态时,对应的log_archive_dest_state_n将置为RESET状态。
当将主库转换为TRANSPORT-ON状态时,对应的log_archive_dest_state_n将置为ENABLE状态。
# TRANSPORT-OFF
DGMGRL> edit database 'orcl' set state='TRANSPORT-OFF';
2021-03-01T17:42:39.159982+08:00
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;
# TRANSPORT-ON
DGMGRL> edit database 'orcl' set state='TRANSPORT-ON';
2021-03-01T17:42:58.926310+08:00
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;
物理备库状态转换
在将物理备库转换为APPLY-ON状态时,broker将使用与Redo Apply相关的属性指定的选项启动Redo Apply。 如果备库是Oracle RAC数据库,则代理将在一个称为Apply实例的备用实例上启动Redo Apply。
从Oracle Database 12c第2版(12.2.0.1)开始,可以将Redo Apply设置为在多个活动运行的物理备用实例上运行。 (此功能要求备库具有Oracle Active Data Guard选项的许可证。)如果已经设置数据库在多个实例上运行Redo Apply,则可以使用Data Guard Broker属性ApplyInstances限制数量。
当物理备库转换为APPLY-OFF状态时,broker停止Redo Apply。
# APPLY-OFF
DGMGRL> edit database 'orcladg' set state='APPLY-OFF';
Succeeded.
2021-03-02T09:46:48.776536+08:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
2021-03-02T09:46:48.777132+08:00
MRP0: Background Media Recovery cancelled with status 16037
2021-03-02T09:46:48.889176+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcladg/orcladg/trace/orcladg_pr00_2632.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 7993612
2021-03-02T09:46:49.040882+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcladg/orcladg/trace/orcladg_pr00_2632.trc:
ORA-16037: user requested cancel of managed recovery operation
2021-03-02T09:46:49.142033+08:00
MRP0: Background Media Recovery process shutdown (orcladg)
2021-03-02T09:46:49.777718+08:00
Managed Standby Recovery Canceled (orcladg)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
# APPLY-ON
DGMGRL> edit database 'orcladg' set state='APPLY-ON';
Succeeded.
2021-03-02T09:47:46.674256+08:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT NODELAY
2021-03-02T09:47:46.674819+08:00
Attempt to start background Managed Standby Recovery process (orcladg)
Starting background process MRP0
2021-03-02T09:47:46.691256+08:00
MRP0 started with pid=56, OS id=2899
2021-03-02T09:47:46.692433+08:00
MRP0: Background Managed Standby Recovery process started (orcladg)
2021-03-02T09:47:51.718042+08:00
Started logmerger process
2021-03-02T09:47:51.737502+08:00
Managed Standby Recovery starting Real Time Apply
2021-03-02T09:47:51.788006+08:00
Parallel Media Recovery started with 2 slaves
2021-03-02T09:47:52.201813+08:00
Media Recovery Waiting for thread 1 sequence 235 (in transit)
2021-03-02T09:47:52.202522+08:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 235 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/orcladg/redo05.log
2021-03-02T09:47:52.699581+08:00
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT NODELAY
设置故障转移目标
step 1.只能为当前主库设置一个目标转移备库,若只有一个备库,此步骤可忽略。
DGMGRL> enable fast_start failover
Enabled in Potential Data Loss Mode.
DGMGRL>
启用 Fast-Start Fail Over ,默认第一个备库未优先切换为从库。
DGMGRL> show configuration
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
Warning: ORA-16819: fast-start failover observer not started
standby01 - (*) Physical standby database
Warning: ORA-16819: fast-start failover observer not started
standby02 - Physical standby database
standby03 - Physical standby database
Warning: ORA-16809: multiple warnings detected for the member
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
WARNING (status updated 56 seconds ago)
DGMGRL>
FQ 1:未开启闪回数据库功能
DGMGRL> enable fast_start failover; Warning: ORA-16827: Flashback Database is disabled Enabled in Potential Data Loss Mode.
step 2.修改目标转移备库,需要先禁用FSFO,改完后再启用
DGMGRL> DISABLE FAST_START FAILOVER Disabled. DGMGRL>
修改转移备库为Standby03
DGMGRL> EDIT DATABASE primary SET PROPERTY FASTSTARTFAILOVERTARGET=standby03;
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE standby03 SET PROPERTY FASTSTARTFAILOVERTARGET=primary;
Property "faststartfailovertarget" updated
DGMGRL>
step 3.设置保护模式与LogXptMode
LOGXPTMODE属性在最大可用性模式下应为SYNC,在最大性能模式下应为ASYNC,主备库LOGXPTMODE设置必须相同
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no member would remain to support protection mode
Failed.
DGMGRL>
最大保护和最高可以的前提条件是日志传输模式为SYNC,所以我们要先修改日志传输模式。
因为考虑到以后要进行故障转移,所以需要将所有的数据库的LogXptMode都设置为SYNC。
DGMGRL> EDIT DATABASE standby03 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE standby02 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE standby01 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE primary SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL>
step 4. 设置FSFO阈值
如果主库与 Observer和目标备库失联时间均超过FastStartFailoverThreshold属性配置阈值(单位为秒),将启动FSFO。
FastStartFailoverThreshold表示,Observer和目标备库 在检测到主库不可用后、希望等多少秒才启动FSFO。再此期间,它们会尝试重连主库。
默认值为30秒,最小6秒。
DGMGRL> edit configuration set property faststartfailoverthreshold=300;
step 5. 设置最大可接受延迟时间
FastStartFailoverLagLimit表示,自动故障转移允许的最大数据丢失量(单位为秒),仅在最高性能保护模式时才可使用。
FastStartFailoverLagLimit作为DG可接受的最大延迟时间,备库的apply lag只有在此限制之内,才允许FSFO。
如果无法维持已配置的数据丢失保证,主库上的redo生成将停止。为了避免长时间停顿,Observer或者目标备库可能会在第一次记录到无法发生FSFO之后,允许主库继续生成redo。此时若主库故障,将无法发生FSFO。
默认值为30秒,最小值为10秒。
DGMGRL> edit configuration set property faststartfailoverlaglimit=50;
step 6.设置FastStartFailoverPmyShutdown属性
默认为true,表示在FastStartFailoverThreshold属性指定的秒数过去之后,使用ABORT选项关闭主数据库。
如果不希望由于 主库redo生成停止 或 主库与Observer和目标备库 失联时间超过FastStartFailoverThreshold指定时间而在触发FSFO后关闭主库,需要将其设置为假。
DGMGRL> edit configuration set property FastStartFailoverThreshold=false;
step 7.设置自动恢复数据库属性
如果FastStartFailoverAutoReinstate属性设置为TRUE,在原主库故障修复后,会自动尝试将其恢复为新主库的备库。
DGMGRL> EDIT CONFIGURATION SET PROPERTY FASTSTARTFAILOVERAUTOREINSTATE=TRUE;
Property "faststartfailoverautoreinstate" updated
step 9.启用fast_start故障转移
DGMGRL> enable fast_start failover; Enabled in Zero Data Loss Mode. DGMGRL>
step 10.查看FSFO配置
DGMGRL> show fast_start failover
Fast-Start Failover: Enabled in Zero Data Loss Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: standby02
Potential Targets: "standby02"
standby02 valid
Observer: (none)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
配置Observer(可选)
ObserverConnectIdentifier用于指定Observer应如何连接并监视主备库。如果希望Observer使用不同于发送重做数据连接串的连接标识符,请为主库和目标备库设置此属性(可选)。
DGMGRL> start observer
[W000 2024-05-08T09:25:45.030+08:00] FSFO target standby is standby01
Observer 'primary' started
[W000 2024-05-08T09:25:45.169+08:00] Observer trace level is set to USER
[W000 2024-05-08T09:25:45.169+08:00] Try to connect to the primary.
[W000 2024-05-08T09:25:45.169+08:00] Try to connect to the primary primary.
[W000 2024-05-08T09:25:45.238+08:00] The standby standby01 is ready to be a FSFO target
[W000 2024-05-08T09:25:45.238+08:00] Connection to the primary restored!
[W000 2024-05-08T09:25:47.238+08:00] Disconnecting from database primary.
通过登录到观察者计算机并运行DGMGRL,最多启动三个观察者。以具有SYSDG或SYSDBA权限的用户身份连接到配置,然后发出Start observer命令。请注意,在发出命令后将不会得到DGMGRL提示。
DGMGRL> CONNECT sysdg@North_Sales.example.com;
Password: password
Connected to "North_Sales"
Connected as SYSDG.
DGMGRL> START OBSERVER observer1 IN BACKGROUND
> FILE IS /net/sales/dat/oracle/broker/fsfo.dat
> LOGFILE IS /net/sales/dat/oracle/broker/observer.log
> CONNECT IDENTIFIER IS North_Sales
Submitted command "START OBSERVER" using connect identifier "North_Sales"
出于安全考虑,Oracle建议使用此命令格式;没有可见的凭据。这种做法可防止系统上的其他用户使用实用程序(例如UNIX ps实用程序)来显示连接凭据。它还防止明文密码在用户的终端上可见。
从脚本启动观察程序时,Oracle建议您使用支持“connect/”的方法,这样数据库连接凭据就不必嵌入脚本中。如果您选择使用客户端Oracle钱包作为安全的外部密码存储,请确保添加主数据库和快速启动故障切换目标备用数据库的凭据。添加每个数据库的凭据时指定的数据库连接字符串必须与ObserverConnectIdentifer或DGConnectIdentifier数据库属性匹配。
当启动多个观察程序时,启用快速启动故障切换后,一个观察程序是主观察程序,其余观察程序是备份观察程序。只有主观察者才能与Data Guard代理协调快速启动故障切换。如果主数据库和目标备用数据库保持连接,但与主观察器的连接丢失,则代理会尝试指定一个备份观察器作为新的主观察者。
测试
需要启动 OBServer
step 1.主库异常关机
step 2.查看 OBServer 日志
DGMGRL> start observer
[W000 2024-05-08T03:35:44.358-04:00] FSFO target standby is standby02
Observer 'standby03' started
[W000 2024-05-08T03:35:44.778-04:00] Observer trace level is set to USER
[W000 2024-05-08T03:35:44.778-04:00] Try to connect to the primary.
[W000 2024-05-08T03:35:44.778-04:00] Try to connect to the primary primary.
[W000 2024-05-08T03:35:44.837-04:00] The standby standby02 is ready to be a FSFO target
[W000 2024-05-08T03:35:45.837-04:00] Connection to the primary restored!
[W000 2024-05-08T03:35:50.846-04:00] Disconnecting from database primary.
[W000 2024-05-08T03:36:53.996-04:00] Primary database cannot be reached.
[W000 2024-05-08T03:36:53.996-04:00] Fast-Start Failover threshold has not exceeded. Retry for the next 30 seconds
[W000 2024-05-08T03:36:54.996-04:00] Try to connect to the primary.
[W000 2024-05-08T03:37:25.004-04:00] Primary database cannot be reached.
[W000 2024-05-08T03:37:25.004-04:00] Fast-Start Failover threshold has expired.
[W000 2024-05-08T03:37:25.004-04:00] Try to connect to the standby.
[W000 2024-05-08T03:37:25.004-04:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 2024-05-08T03:37:25.004-04:00] Check if the standby is ready for failover.
[S002 2024-05-08T03:37:25.012-04:00] Fast-Start Failover started...
2024-05-08T03:37:25.012-04:00
Initiating Fast-Start Failover to database "standby02"...
[S002 2024-05-08T03:37:25.012-04:00] Initiating Fast-start Failover.
Performing failover NOW, please wait...
ORA-12543: TNS:destination host unreachable
Unable to connect to database using primary
ORA-12170: TNS:Connect timeout occurred
Unable to connect to database using primary
Failover succeeded, new primary is "standby02"
2024-05-08T03:38:22.701-04:00
[S002 2024-05-08T03:38:22.701-04:00] Fast-Start Failover finished...
[W000 2024-05-08T03:38:22.701-04:00] Failover succeeded. Restart pinging.
[W000 2024-05-08T03:38:22.729-04:00] Primary database has changed to standby02.
[W000 2024-05-08T03:38:22.734-04:00] Try to connect to the primary.
[W000 2024-05-08T03:38:22.734-04:00] Try to connect to the primary standby02.
[W000 2024-05-08T03:38:22.876-04:00] The standby primary needs to be reinstated
[W000 2024-05-08T03:38:22.877-04:00] Try to connect to the new standby primary.
[W000 2024-05-08T03:38:23.877-04:00] Connection to the primary restored!
ORA-12543: TNS:destination host unreachable
Unable to connect to database using primary
[W000 2024-05-08T03:38:26.885-04:00] Reinstating database standby01.
2024-05-08T03:38:26.885-04:00
Initiating reinstatement for database "standby01"...
Reinstating database "standby01", please wait...
Error: ORA-16653: failed to reinstate database
Failed.
Reinstatement of database "standby01" failed
2024-05-08T03:38:28.974-04:00
[W000 2024-05-08T03:38:30.972-04:00] Reinstating database standby03.
2024-05-08T03:38:30.972-04:00
Initiating reinstatement for database "standby03"...
Reinstating database "standby03", please wait...
Error: ORA-16653: failed to reinstate database
step 3.查看当前配置
DGMGRL> SHOW CONFIGURATION
Configuration - xiangxun
Protection Mode: MaxAvailability
Members:
primary - Primary database
Error: ORA-12543: TNS:destination host unreachable
standby02 - (*) Physical standby database
standby01 - Physical standby database
standby03 - Physical standby database
Fast-Start Failover: Enabled in Zero Data Loss Mode
Configuration Status:
ERROR (status updated 0 seconds ago)
DGMGRL>
step 4.查看备库 standby02 的加色
SQL> select open_mode,protection_mode,database_role,switchover_status from v$database;
OPEN_MODE PROTECTION_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------------------------------------- ---------------------------------------- -------------------------------- ----------------------------------------
READ WRITE MAXIMUM AVAILABILITY PRIMARY NOT ALLOWED
SQL>
当前 standby02 已经提升为 主库
step 5.当主库故障恢复后,启动数据库和监听
[oracle@primary ~]$ sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed May 8 15:46:04 2024
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to an idle instance.
SYS@orcl> startup
ORACLE instance started.
Total System Global Area 2432695144 bytes
Fixed Size 8899432 bytes
Variable Size 536870912 bytes
Database Buffers 1879048192 bytes
Redo Buffers 7876608 bytes
Database mounted.
ORA-16649: possible failover to another database prevents this database from being opened
SYS@orcl>
step 6.查看 OBServer 日志
2024-05-08T03:47:23.207-04:00
Initiating reinstatement for database "primary"...
Reinstating database "primary", please wait...
[W000 2024-05-08T03:48:00.285-04:00] Try to connect to the primary standby02.
[W000 2024-05-08T03:48:01.285-04:00] Connection to the primary restored!
[W000 2024-05-08T03:48:03.291-04:00] Reinstating database standby01.
2024-05-08T03:48:03.291-04:00
Initiating reinstatement for database "standby01"...
Reinstating database "standby01", please wait...
[W000 2024-05-08T03:48:13.316-04:00] The standby primary is ready to be a FSFO target
Reinstatement of database "primary" succeeded
2024-05-08T03:49:47.563-04:00
[W000 2024-05-08T03:49:48.516-04:00] Successfully reinstated database primary.
Error: ORA-16653: failed to reinstate database
Failed.
Reinstatement of database "standby01" failed
2024-05-08T03:49:54.481-04:00
[W000 2024-05-08T03:49:55.530-04:00] Reinstating database standby03.
2024-05-08T03:49:55.530-04:00
Initiating reinstatement for database "standby03"...
Reinstating database "standby03", please wait...
Error: ORA-16653: failed to reinstate database
Failed.
Reinstatement of database "standby03" failed
2024-05-08T03:50:02.022-04:00
[W000 2024-05-08T03:50:03.020-04:00] Disconnecting from database standby02.
当主库物理故障恢复后,从新进行切换回主库。
相关视图
SELECT fs_failover_status,
fs_failover_current_target,
fs_failover_threshold,
fs_failover_observer_present,
fs_failover_observer_host
FROM v$database;
FQ
FQ 1:
DGMGRL> SHOW CONFIGURATION
Configuration - xiangxun
Protection Mode: MaxPerformance
Members:
primary - Primary database
standby01 - Physical standby database
standby02 - Physical standby database
standby03 - Physical standby database
Warning: ORA-16857: member disconnected from redo source for longer than specified threshold
Fast-Start Failover: Disabled
Configuration Status:
WARNING (status updated 54 seconds ago)
DGMGRL>
Data Guard代理还具有可配置的数据库属性,当传输或应用滞后超过用户定义的值时,这些属性可用于生成警告:
当数据库的应用滞后超过属性指定的值时,ApplyLagThreshold可配置数据库属性会为逻辑或物理备用生成警告状态。属性值以秒为单位表示。如果值为0秒,则在存在应用滞后时不会生成任何警告。作为最佳实践,Oracle建议将ApplyLagThreshold设置为至少15分钟。
当数据库的传输滞后超过属性指定的值时,TransportLagThreshold可配置数据库属性可用于为逻辑、物理或快照备用生成警告状态。属性值以秒为单位表示。如果值为0秒,则在存在传输滞后时不会生成任何警告。作为最佳做法,Oracle建议将TransportLagThreshold设置为至少15分钟。
DGMGRL> edit database standby03 set property ApplyLagThreshold=900;
Property "applylagthreshold" updated
DGMGRL> show database verbose standby03
Database - standby03
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 5 minutes 25 seconds (computed 37 seconds ago)
Apply Lag: 5 minutes 25 seconds (computed 37 seconds ago)
Average Apply Rate: 1.00 KByte/s
Active Apply Rate: 360.00 KByte/s
Maximum Apply Rate: 360.00 KByte/s
Real Time Query: OFF
Instance(s):
orcl
Database Warning(s):
ORA-16855: transport lag has exceeded specified threshold
ORA-16857: member disconnected from redo source for longer than specified threshold
Properties:
DGConnectIdentifier = 'standby03'
ObserverConnectIdentifier = ''
FastStartFailoverTarget = 'primary'
PreferredObserverHosts = ''
LogShipping = 'ON'
RedoRoutes = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyLagThreshold = '900'
TransportLagThreshold = '30'
TransportDisconnectedThreshold = '30'
ApplyParallel = 'AUTO'
ApplyInstances = '0'
StandbyFileManagement = ''
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '0'
LogArchiveMinSucceedDest = '0'
DataGuardSyncLatency = '0'
LogArchiveTrace = '0'
LogArchiveFormat = ''
DbFileNameConvert = ''
LogFileNameConvert = ''
ArchiveLocation = ''
AlternateLocation = ''
StandbyArchiveLocation = ''
StandbyAlternateLocation = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
LogXptStatus = '(monitor)'
SendQEntries = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'standby03'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby03)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby03_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
TopWaitEvents = '(monitor)'
SidName = '(monitor)'
Log file locations:
Alert log : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/alert_orcl.log
Data Guard Broker log : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/drcorcl.log
Database Status:
WARNING
DGMGRL>
FQ 2:
DGMGRL> show database verbose standby03
Database - standby03
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 6 minutes 25 seconds (computed 34 seconds ago)
Apply Lag: 6 minutes 25 seconds (computed 34 seconds ago)
Average Apply Rate: 1.00 KByte/s
Active Apply Rate: 360.00 KByte/s
Maximum Apply Rate: 360.00 KByte/s
Real Time Query: OFF
Instance(s):
orcl
Database Warning(s):
ORA-16855: transport lag has exceeded specified threshold
ORA-16857: member disconnected from redo source for longer than specified threshold
Properties:
DGConnectIdentifier = 'standby03'
ObserverConnectIdentifier = ''
FastStartFailoverTarget = 'primary'
PreferredObserverHosts = ''
LogShipping = 'ON'
RedoRoutes = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyLagThreshold = '900'
TransportLagThreshold = '30'
TransportDisconnectedThreshold = '30'
ApplyParallel = 'AUTO'
ApplyInstances = '0'
StandbyFileManagement = ''
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '0'
LogArchiveMinSucceedDest = '0'
DataGuardSyncLatency = '0'
LogArchiveTrace = '0'
LogArchiveFormat = ''
DbFileNameConvert = ''
LogFileNameConvert = ''
ArchiveLocation = ''
AlternateLocation = ''
StandbyArchiveLocation = ''
StandbyAlternateLocation = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
LogXptStatus = '(monitor)'
SendQEntries = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'standby03'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby03)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby03_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
TopWaitEvents = '(monitor)'
SidName = '(monitor)'
Log file locations:
Alert log : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/alert_orcl.log
Data Guard Broker log : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/drcorcl.log
Database Status:
WARNING
DGMGRL>
上面输出中ORA-16855的错误是正常的,可以看下要求的delay是多少:
SYS@orcl> show parameter log_archive_dest_4
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
log_archive_dest_4 string service="standby03", ASYNC NOA
FFIRM delay=0 optional compres
sion=disable max_failure=0 reo
pen=300 db_unique_name="standb
y03" net_timeout=30, valid_for
=(online_logfile,all_roles)
SYS@orcl>
在 DGMGRL 中查看
DGMGRL> show database verbose "standby03" DelayMins
DelayMins = '0'
DGMGRL>
可以看到,要求的为 0,而实际的滞后为1分钟:
SYS@orcl> SELECT * FROM v$dataguard_stats WHERE name LIKE '%lag%';
SOURCE_DBID SOURCE_DB_UNIQUE_NAME NAME
----------- ---------------------------------------------------------------- ----------------------------------------------------------------
VALUE UNIT
-------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------
TIME_COMPUTED DATUM_TIME CON_ID
------------------------------------------------------------ ------------------------------------------------------------ ----------
1696144244 PRIMARY transport lag
+00 00:00:00 day(2) to second(0) interval
05/08/2024 13:51:19 05/08/2024 13:51:18 0
1696144244 PRIMARY apply lag
+00 00:00:00 day(2) to second(0) interval
05/08/2024 13:51:19 05/08/2024 13:51:18 0
SYS@orcl>
修改 DelayMins 为 120 秒
DGMGRL> edit database standby03 set property DelayMins=120;




