02.Install Oracle 19C on Linux, use DGBroker to create a backup with one primary and three backups

原创 ByteHouse 2024-05-08

341

1.配置 dg_broker

https://blog.csdn.net/qq_39466006/article/details/132018521
架构：以一主三备为例

IP	hostname	db_unique_name	DG Broker role	switchover role	failover role
192.168.10.91	primary	primary	主库	从库	从库
192.168.10.92	standby-01	standby-01	从库	主库	从库
192.168.10.93	standby-02	standby-02	从库	从库	从库
192.168.10.94	standby-03	standby-03	从库	从库	从库

使用以下方法可以添加多个备库，具体配置的差异由你来确定

2.安装oracle数据库软件

3.备数据库配置

3.1.拷贝参数文件

scp /u01/app/oracle/product/19.3.0/dbhome_1/dbs/spfileorcl.ora oracle@192.168.10.94:/u01/app/oracle/product/19.3.0/dbhome_1/dbs

3.2.拷贝密码文件

scp /u01/app/oracle/product/19.3.0/dbhome_1/dbs/orapworcl oracle@192.168.10.94:/u01/app/oracle/product/19.3.0/dbhome_1/dbs

3.3.创建相应的文件目录

根据上面修改的参数文件，为备库创建相应的文件目录

mkdir -p /u01/app/oracle/archivelog
mkdir -p /u01/app/oracle/fast_recovery_area/orcl
mkdir -p /u01/app/oracle/admin/orcl/adump
mkdir -p /u01/app/oracle/oradata/orcl

chmod 775 /u01 -R

3.4.修改监听配置文件

# listener.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = standby-database)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = standby02)
      (SID_NAME = orcl)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
    )
  (SID_DESC =
      (GLOBAL_DBNAME = standby02_DGMGRL)
      (ORACLE_HOME = /u01/app/oracle/product/19.3.0/dbhome_1)
      (SID_NAME = orcl)
      (ENVS="TNS_ADMIN=/u01/app/oracle/product/19.3.0/dbhome_1/network/admin")
    )

  )

ADR_BASE_LISTENER = /u01/app/oracle

3.5.修改TNS配置文件

# tnsnames.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

LISTENER_ORCL =
  (ADDRESS = (PROTOCOL = TCP)(HOST = standby-database)(PORT = 1521))


STANDBY01 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = standby01)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = standby01)
     （UR = A）
    )
  )
STANDBY02 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = standby02)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = standby02)
     （UR = A）
    )
  )
STANDBY03 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = standby03)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = standby03)
     （UR = A）
    )
  )

PRIMARY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = primary)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = primary)
     （UR = A）
    )
  )

创建认证配置文件

cat > /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora <<EOF
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/19.3.0/dbhome_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

ADR_BASE = /u01/app/oracle
EOF

3.6.修改 db_unique_name

alter system set db_unique_name='standby02' scope=spfile;

重新启动数据库到 nomount 状态，然后再启动监听服务

lsnrctl start

3.7.验证监听和TNS配置

主库:

sqlplus sys/admin123@primary as sysdba

sqlplus sys/admin123@standby02 as sysdba

备库:

sqlplus sys/admin123@primary as sysdba

sqlplus sys/admin123@standby02 as sysdba

注意：该步骤一定要在主备库上都能通过才能执行下面步骤

3.8.恢复数据库

# 连接数据库
rman target sys/admin123@primary auxiliary sys/admin123@standby02 nocatalog

# 备份数据
duplicate target database for standby from active database nofilenamecheck;

4.dg_broker 添加备库

step 1.查看 DG_BROKER_CONFIG_FILE文件

dg_broker_config_file1以及dg_broker_config_file2对应路径必须先存在(不存在需要先创建，否则启动dg_broker_start提示ORA-16604:
Data Guard broker configuration file inaccessible)

SQL> show parameter broker

SYS@orcl> show parameter broker

NAME                     TYPE            VALUE
------------------------------------ ---------------------- ------------------------------
connection_brokers             string            ((TYPE=DEDICATED)(BROKERS=1)),
                                 ((TYPE=EMON)(BROKERS=1))
dg_broker_config_file1             string            /u01/app/oracle/product/19.3.0
                                /dbhome_1/dbs/dr1orcl.dat
dg_broker_config_file2             string            /u01/app/oracle/product/19.3.0
                                /dbhome_1/dbs/dr2orcl.dat
dg_broker_start              boolean            FALSE
use_dedicated_broker             boolean            FALSE
SYS@orcl>

可以用默认的路径，也可以自己指定（四个节点均需配置）。如果是RAC环境，把这个文件把到共享存储或者ASM中。

step 2.重置 LOG_ARCHIVE_DEST_n 信息
三个节点均需配置，这步开始，已有主从同步会中断。

ALTER SYSTEM set LOG_ARCHIVE_DEST_2='';
ALTER SYSTEM set LOG_ARCHIVE_DEST_3='';

使用 reset 需要重启。

FQ :

DGMGRL> add database 'standby03' as connect identifier is standby03;
Error: ORA-16698: member has a LOG_ARCHIVE_DEST_n parameter with SERVICE attribute set

Failed.

step 3.开启 dg_broker_start 为TRUE,默认是FALSE(主备库)

SQL> alter system set dg_broker_start=true sid='*';

step 4.增加备库：

DGMGRL> add database 'standby02' as connect identifier is standby02;
Database "standby02" added
DGMGRL>

查看配置：

DGMGRL> show configuration;

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    standby01 - Physical standby database 
    standby02 - Physical standby database (disabled)
      ORA-16905: The member was not enabled yet.

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 20 seconds ago)

DGMGRL>

step 5.启用当前配置:

DGMGRL> enable Database standby02;
Enabled.
DGMGRL>

查看配置详细信息:

DGMGRL> show configuration;

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    standby01 - Physical standby database 
    standby02 - Physical standby database 
      Error: ORA-16810: multiple errors or warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
ERROR   (status updated 56 seconds ago)

DGMGRL>

查看配置详细信息:

DGMGRL> show configuration verbose;

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    standby01 - Physical standby database 
    standby02 - Physical standby database 

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'orcl_CFG'

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS

DGMGRL>

获取信息中包含任何告警都需要进行处理。

step 6.通过dgmgrl查看主备实例详细

前面使用单引号增加database,后面查询一律要使用。

备库信息：

DGMGRL> show database verbose standby02

Database - standby02

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 3.00 KByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    orcl

  Properties:
    DGConnectIdentifier             = 'standby02'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = ''
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = ''
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '0'
    LogArchiveMinSucceedDest        = '0'
    DataGuardSyncLatency            = '0'
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = ''
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'standby02'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby02_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/standby02/orcl/trace/alert_orcl.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/standby02/orcl/trace/drcorcl.log

Database Status:
SUCCESS

DGMGRL>

当前备库状态是节点1实时应用日志，且备库同步都正常。

5.备库重新启用数据库闪回

验证是否开启闪回

SYS@orcl> select flashback_on from v$database;

FLASHBACK_ON
------------------------------------
NO

SYS@orcl> show parameter db_recovery_file_dest

NAME                     TYPE            VALUE
------------------------------------ ---------------------- ------------------------------
db_recovery_file_dest             string
db_recovery_file_dest_size         big integer        0
SYS@orcl>

数据库已经开启flashback，那么下面步骤可忽略。

如上显示，该数据库未开启flashback，可按下面方法开启。

创建文件夹

[oracle@primary-database oracle]$ mkdir /u01/app/oracle/fast_recovery_area

SYS@orcl> alter system set db_recovery_file_dest_size='5G';

System altered.

SYS@orcl> alter system set db_recovery_file_dest='/u01/app/oracle/fast_recovery_area';

System altered.

SYS@orcl> show parameter db_recovery_file_dest

NAME                     TYPE            VALUE
------------------------------------ ---------------------- ------------------------------
db_recovery_file_dest             string            /u01/app/oracle/fast_recovery_
                                area
db_recovery_file_dest_size         big integer        5G
SYS@orcl>

开启闪回

SYS@orcl> alter database flashback on;

Database altered.

SYS@orcl> select flashback_on from v$database;

FLASHBACK_ON
------------------------------------
YES

SYS@orcl>

FQ 1：未配置闪回目录

SQL> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-38706: Cannot turn on FLASHBACK DATABASE logging.
ORA-38709: Recovery Area is not enabled.


SQL> show parameter db_

FQ 2：日志应用未停止

SYS@orcl> alter database flashback on;
alter database flashback on
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active


SYS@orcl>  ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

SYS@orcl>

6.Fast-Start Fail Over

fast-start failover(FSFO)功能是当primary数据库Crash后，它会自动地快速执行切换standby数据库为primary数据库操作。
那么什么时候会触发fast-start failover呢？
当数据库以正常模式(shutdown immediate/normal/transactional)关闭时，系统不会触发fast-start failover。使用非正常关闭方式 shutdown abort 关闭primary时会触发fast-start failover。

这里引入一个新的进程observer，observer是用来监控primary数据库是否可用的，当主库不可用时，会执行快速切换standby数据库为primary数据库。

只有maximum availability mode or maximum performance mode才能启用fast-start failover模式。在maximum availability模式下面，在切换时可以保证无数据丢失，在maximum performance mode下面，会有数据丢失，丢失多少数据由FastStartFailoverLagLimit这个参数来配置。

step 1.查看配置属性

DGMGRL> show configuration verbose;

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    standby01 - Physical standby database 
    standby02 - Physical standby database 
    standby03 - Physical standby database 

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'orcl_CFG'

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS

DGMGRL>

修改数据库状态

DGMGRL> show database standby03

Database - standby03

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 351 seconds ago)
  Apply Lag:          0 seconds (computed 351 seconds ago)
  Average Apply Rate: 17.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    orcl

  Database Warning(s):
    ORA-16857: member disconnected from redo source for longer than specified threshold

Database Status:
WARNING

DGMGRL>

Intended State 有以下几种：

TRANSPORT-ON
TRANSPORT-OFF
APPLY-ON
APPLY-OFF

数据库角色	状态	描述
Primary	TRANSPORT-ON	设置重做传输服务，以便在打开主库以进行读/写访问时将重做数据传输到备库或远程同步实例。如果这是Oracle RAC数据库，则所有以读/写模式打开的实例都将运行重做传输服务。首次启用时，这是主库的默认状态。
Primary	TRANSPORT-OFF	重做传输服务在主库上停止。如果这是Oracle RAC数据库，则重做传输服务不在任何实例上运行。
Physical Standby	APPLY-ON	备库启用Redo Apply。如果备库是Oracle RAC数据库，则代理将仅在一个称为Apply实例的备库实例上启动Redo Apply。如果此实例失败，则代理将自动选择另一个已安装或打开为只读的实例。然后该新实例将成为Apply实例。从Oracle Database 12c第2版（12.2.0.1）开始，可以将Redo Apply设置为在每个活动的运行中的物理备用实例上运行。如果已经设置数据库在多个实例上运行重做应用，则可以使用DG Broker属性ApplyInstances来限制Oracle RAC物理备库上重做应用所涉及的实例数量。
Physical Standby	APPLY-OFF	备库停用Redo Apply 如果这是Oracle RAC数据库，则在将数据库状态更改为APPLY-ON之前，没有实例运行Apply Services。
Logical Standby	APPLY-ON	打开逻辑备库并打开逻辑备库保护后，将在逻辑备库上启动SQL Apply。如果这是一个Oracle RAC数据库，则SQL Apply正在一个实例（apply实例）上运行。如果此实例失败，则代理将自动选择另一个打开的实例。这个新实例将成为apply实例。首次启用时这是逻辑备库的默认状态。
Logical Standby	APPLY-OFF	SQL Apply已停止。逻辑备库保护功能已打开。如果这是Oracle RAC数据库，则在将状态更改为APPLY-ON之前，不会运行SQL Apply的实例。

主库状态转换

当将主库转换为TRANSPORT-ON状态时，broker将使用配置成员的与重做传输相关的属性以及主库上的RedoRoutes属性来设置重做传输服务。
重做传输服务的设置是通过在主数据库上设置LOG_ARCHIVE_DEST_n和LOG_ARCHIVE_DEST_STATE_n初始化参数，以及在所有数据库（主库或备库）和远程同步实例上设置LOG_ARCHIVE_CONFIG初始化参数来完成的。
当将主库转换为TRANSPORT-OFF状态时，对应的log_archive_dest_state_n将置为RESET状态。
当将主库转换为TRANSPORT-ON状态时，对应的log_archive_dest_state_n将置为ENABLE状态。

# TRANSPORT-OFF
DGMGRL> edit database 'orcl' set state='TRANSPORT-OFF';
2021-03-01T17:42:39.159982+08:00
ALTER SYSTEM SET log_archive_dest_state_2='RESET' SCOPE=BOTH;

# TRANSPORT-ON
DGMGRL> edit database 'orcl' set state='TRANSPORT-ON';
2021-03-01T17:42:58.926310+08:00
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;

物理备库状态转换

在将物理备库转换为APPLY-ON状态时，broker将使用与Redo Apply相关的属性指定的选项启动Redo Apply。如果备库是Oracle RAC数据库，则代理将在一个称为Apply实例的备用实例上启动Redo Apply。
从Oracle Database 12c第2版（12.2.0.1）开始，可以将Redo Apply设置为在多个活动运行的物理备用实例上运行。（此功能要求备库具有Oracle Active Data Guard选项的许可证。）如果已经设置数据库在多个实例上运行Redo Apply，则可以使用Data Guard Broker属性ApplyInstances限制数量。
当物理备库转换为APPLY-OFF状态时，broker停止Redo Apply。

# APPLY-OFF
DGMGRL> edit database 'orcladg' set state='APPLY-OFF';
Succeeded.
2021-03-02T09:46:48.776536+08:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
2021-03-02T09:46:48.777132+08:00
MRP0: Background Media Recovery cancelled with status 16037
2021-03-02T09:46:48.889176+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcladg/orcladg/trace/orcladg_pr00_2632.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 7993612
2021-03-02T09:46:49.040882+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcladg/orcladg/trace/orcladg_pr00_2632.trc:
ORA-16037: user requested cancel of managed recovery operation
2021-03-02T09:46:49.142033+08:00
MRP0: Background Media Recovery process shutdown (orcladg)
2021-03-02T09:46:49.777718+08:00
Managed Standby Recovery Canceled (orcladg)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL

# APPLY-ON
DGMGRL> edit database 'orcladg' set state='APPLY-ON';
Succeeded.
2021-03-02T09:47:46.674256+08:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT  NODELAY
2021-03-02T09:47:46.674819+08:00
Attempt to start background Managed Standby Recovery process (orcladg)
Starting background process MRP0
2021-03-02T09:47:46.691256+08:00
MRP0 started with pid=56, OS id=2899 
2021-03-02T09:47:46.692433+08:00
MRP0: Background Managed Standby Recovery process started (orcladg)
2021-03-02T09:47:51.718042+08:00
 Started logmerger process
2021-03-02T09:47:51.737502+08:00
Managed Standby Recovery starting Real Time Apply
2021-03-02T09:47:51.788006+08:00
Parallel Media Recovery started with 2 slaves
2021-03-02T09:47:52.201813+08:00
Media Recovery Waiting for thread 1 sequence 235 (in transit)
2021-03-02T09:47:52.202522+08:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 235 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/orcladg/redo05.log
2021-03-02T09:47:52.699581+08:00
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT  NODELAY

设置故障转移目标

step 1.只能为当前主库设置一个目标转移备库，若只有一个备库，此步骤可忽略。

DGMGRL> enable fast_start failover 
Enabled in Potential Data Loss Mode.
DGMGRL>

启用 Fast-Start Fail Over ，默认第一个备库未优先切换为从库。

DGMGRL> show configuration

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    Warning: ORA-16819: fast-start failover observer not started

    standby01 - (*) Physical standby database 
      Warning: ORA-16819: fast-start failover observer not started

    standby02 - Physical standby database 
    standby03 - Physical standby database 
      Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover: Enabled in Potential Data Loss Mode

Configuration Status:
WARNING   (status updated 56 seconds ago)

DGMGRL>

FQ 1：未开启闪回数据库功能

DGMGRL> enable fast_start failover;
Warning: ORA-16827: Flashback Database is disabled

Enabled in Potential Data Loss Mode.

step 2.修改目标转移备库，需要先禁用FSFO，改完后再启用

DGMGRL> DISABLE FAST_START FAILOVER
Disabled.
DGMGRL>

修改转移备库为Standby03

DGMGRL> EDIT DATABASE primary SET PROPERTY FASTSTARTFAILOVERTARGET=standby03;
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE standby03 SET PROPERTY FASTSTARTFAILOVERTARGET=primary;
Property "faststartfailovertarget" updated
DGMGRL>

step 3.设置保护模式与LogXptMode
LOGXPTMODE属性在最大可用性模式下应为SYNC，在最大性能模式下应为ASYNC，主备库LOGXPTMODE设置必须相同

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Error: ORA-16627: operation disallowed since no member would remain to support protection mode

Failed.
DGMGRL>

最大保护和最高可以的前提条件是日志传输模式为SYNC，所以我们要先修改日志传输模式。
因为考虑到以后要进行故障转移，所以需要将所有的数据库的LogXptMode都设置为SYNC。

DGMGRL> EDIT DATABASE standby03 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE standby02 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE standby01 SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT DATABASE primary SET PROPERTY LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.
DGMGRL>

step 4. 设置FSFO阈值
如果主库与 Observer和目标备库失联时间均超过FastStartFailoverThreshold属性配置阈值（单位为秒），将启动FSFO。

FastStartFailoverThreshold表示，Observer和目标备库在检测到主库不可用后、希望等多少秒才启动FSFO。再此期间，它们会尝试重连主库。

默认值为30秒，最小6秒。

DGMGRL> edit configuration set property faststartfailoverthreshold=300;

step 5. 设置最大可接受延迟时间

FastStartFailoverLagLimit表示，自动故障转移允许的最大数据丢失量（单位为秒），仅在最高性能保护模式时才可使用。

FastStartFailoverLagLimit作为DG可接受的最大延迟时间，备库的apply lag只有在此限制之内，才允许FSFO。

如果无法维持已配置的数据丢失保证，主库上的redo生成将停止。为了避免长时间停顿，Observer或者目标备库可能会在第一次记录到无法发生FSFO之后，允许主库继续生成redo。此时若主库故障，将无法发生FSFO。

默认值为30秒，最小值为10秒。

DGMGRL> edit configuration set property faststartfailoverlaglimit=50;

step 6.设置FastStartFailoverPmyShutdown属性

默认为true，表示在FastStartFailoverThreshold属性指定的秒数过去之后，使用ABORT选项关闭主数据库。

如果不希望由于主库redo生成停止或主库与Observer和目标备库失联时间超过FastStartFailoverThreshold指定时间而在触发FSFO后关闭主库，需要将其设置为假。

DGMGRL> edit configuration set property FastStartFailoverThreshold=false;

step 7.设置自动恢复数据库属性
如果FastStartFailoverAutoReinstate属性设置为TRUE，在原主库故障修复后，会自动尝试将其恢复为新主库的备库。

DGMGRL> EDIT CONFIGURATION SET PROPERTY FASTSTARTFAILOVERAUTOREINSTATE=TRUE;

Property "faststartfailoverautoreinstate" updated

step 9.启用fast_start故障转移

DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.
DGMGRL>

step 10.查看FSFO配置

DGMGRL> show fast_start failover

Fast-Start Failover: Enabled in Zero Data Loss Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          30 seconds
  Active Target:      standby02
  Potential Targets:  "standby02"
    standby02  valid
  Observer:           (none)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL>

配置Observer（可选）

ObserverConnectIdentifier用于指定Observer应如何连接并监视主备库。如果希望Observer使用不同于发送重做数据连接串的连接标识符，请为主库和目标备库设置此属性（可选）。

DGMGRL> start observer
[W000 2024-05-08T09:25:45.030+08:00] FSFO target standby is standby01
Observer 'primary' started
[W000 2024-05-08T09:25:45.169+08:00] Observer trace level is set to USER
[W000 2024-05-08T09:25:45.169+08:00] Try to connect to the primary.
[W000 2024-05-08T09:25:45.169+08:00] Try to connect to the primary primary.
[W000 2024-05-08T09:25:45.238+08:00] The standby standby01 is ready to be a FSFO target
[W000 2024-05-08T09:25:45.238+08:00] Connection to the primary restored!
[W000 2024-05-08T09:25:47.238+08:00] Disconnecting from database primary.

通过登录到观察者计算机并运行DGMGRL，最多启动三个观察者。以具有SYSDG或SYSDBA权限的用户身份连接到配置，然后发出Start observer命令。请注意，在发出命令后将不会得到DGMGRL提示。

DGMGRL> CONNECT sysdg@North_Sales.example.com;
Password: password
Connected to "North_Sales"
Connected as SYSDG.

DGMGRL> START OBSERVER observer1 IN BACKGROUND
> FILE IS /net/sales/dat/oracle/broker/fsfo.dat
> LOGFILE IS /net/sales/dat/oracle/broker/observer.log
> CONNECT IDENTIFIER IS North_Sales
Submitted command "START OBSERVER" using connect identifier "North_Sales"

出于安全考虑，Oracle建议使用此命令格式；没有可见的凭据。这种做法可防止系统上的其他用户使用实用程序（例如UNIX ps实用程序）来显示连接凭据。它还防止明文密码在用户的终端上可见。

从脚本启动观察程序时，Oracle建议您使用支持“connect/”的方法，这样数据库连接凭据就不必嵌入脚本中。如果您选择使用客户端Oracle钱包作为安全的外部密码存储，请确保添加主数据库和快速启动故障切换目标备用数据库的凭据。添加每个数据库的凭据时指定的数据库连接字符串必须与ObserverConnectIdentifer或DGConnectIdentifier数据库属性匹配。

当启动多个观察程序时，启用快速启动故障切换后，一个观察程序是主观察程序，其余观察程序是备份观察程序。只有主观察者才能与Data Guard代理协调快速启动故障切换。如果主数据库和目标备用数据库保持连接，但与主观察器的连接丢失，则代理会尝试指定一个备份观察器作为新的主观察者。

测试

需要启动 OBServer

step 1.主库异常关机
step 2.查看 OBServer 日志

DGMGRL> start observer
[W000 2024-05-08T03:35:44.358-04:00] FSFO target standby is standby02
Observer 'standby03' started
[W000 2024-05-08T03:35:44.778-04:00] Observer trace level is set to USER
[W000 2024-05-08T03:35:44.778-04:00] Try to connect to the primary.
[W000 2024-05-08T03:35:44.778-04:00] Try to connect to the primary primary.
[W000 2024-05-08T03:35:44.837-04:00] The standby standby02 is ready to be a FSFO target
[W000 2024-05-08T03:35:45.837-04:00] Connection to the primary restored!
[W000 2024-05-08T03:35:50.846-04:00] Disconnecting from database primary.
[W000 2024-05-08T03:36:53.996-04:00] Primary database cannot be reached.
[W000 2024-05-08T03:36:53.996-04:00] Fast-Start Failover threshold has not exceeded. Retry for the next 30 seconds
[W000 2024-05-08T03:36:54.996-04:00] Try to connect to the primary.
[W000 2024-05-08T03:37:25.004-04:00] Primary database cannot be reached.
[W000 2024-05-08T03:37:25.004-04:00] Fast-Start Failover threshold has expired.
[W000 2024-05-08T03:37:25.004-04:00] Try to connect to the standby.
[W000 2024-05-08T03:37:25.004-04:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 2024-05-08T03:37:25.004-04:00] Check if the standby is ready for failover.
[S002 2024-05-08T03:37:25.012-04:00] Fast-Start Failover started...

2024-05-08T03:37:25.012-04:00
Initiating Fast-Start Failover to database "standby02"...
[S002 2024-05-08T03:37:25.012-04:00] Initiating Fast-start Failover.
Performing failover NOW, please wait...
ORA-12543: TNS:destination host unreachable

Unable to connect to database using primary
ORA-12170: TNS:Connect timeout occurred

Unable to connect to database using primary
Failover succeeded, new primary is "standby02"
2024-05-08T03:38:22.701-04:00
[S002 2024-05-08T03:38:22.701-04:00] Fast-Start Failover finished...
[W000 2024-05-08T03:38:22.701-04:00] Failover succeeded. Restart pinging.
[W000 2024-05-08T03:38:22.729-04:00] Primary database has changed to standby02.
[W000 2024-05-08T03:38:22.734-04:00] Try to connect to the primary.
[W000 2024-05-08T03:38:22.734-04:00] Try to connect to the primary standby02.
[W000 2024-05-08T03:38:22.876-04:00] The standby primary needs to be reinstated
[W000 2024-05-08T03:38:22.877-04:00] Try to connect to the new standby primary.
[W000 2024-05-08T03:38:23.877-04:00] Connection to the primary restored!
ORA-12543: TNS:destination host unreachable

Unable to connect to database using primary
[W000 2024-05-08T03:38:26.885-04:00] Reinstating database standby01.

2024-05-08T03:38:26.885-04:00
Initiating reinstatement for database "standby01"...
Reinstating database "standby01", please wait...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "standby01" failed
2024-05-08T03:38:28.974-04:00
[W000 2024-05-08T03:38:30.972-04:00] Reinstating database standby03.

2024-05-08T03:38:30.972-04:00
Initiating reinstatement for database "standby03"...
Reinstating database "standby03", please wait...
Error: ORA-16653: failed to reinstate database

step 3.查看当前配置

DGMGRL> SHOW CONFIGURATION

Configuration - xiangxun

  Protection Mode: MaxAvailability
  Members:
  primary   - Primary database
    Error: ORA-12543: TNS:destination host unreachable

    standby02 - (*) Physical standby database 
    standby01 - Physical standby database 
    standby03 - Physical standby database 

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:
ERROR   (status updated 0 seconds ago)

DGMGRL>

step 4.查看备库 standby02 的加色

SQL> select open_mode,protection_mode,database_role,switchover_status from v$database;

OPEN_MODE				 PROTECTION_MODE			  DATABASE_ROLE 		   SWITCHOVER_STATUS
---------------------------------------- ---------------------------------------- -------------------------------- ----------------------------------------
READ WRITE				 MAXIMUM AVAILABILITY			  PRIMARY			   NOT ALLOWED

SQL>

当前 standby02 已经提升为主库

step 5.当主库故障恢复后，启动数据库和监听

[oracle@primary ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed May 8 15:46:04 2024
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SYS@orcl> startup
ORACLE instance started.

Total System Global Area 2432695144 bytes
Fixed Size		    8899432 bytes
Variable Size		  536870912 bytes
Database Buffers	 1879048192 bytes
Redo Buffers		    7876608 bytes
Database mounted.
ORA-16649: possible failover to another database prevents this database from being opened


SYS@orcl>

step 6.查看 OBServer 日志

2024-05-08T03:47:23.207-04:00
Initiating reinstatement for database "primary"...
Reinstating database "primary", please wait...
[W000 2024-05-08T03:48:00.285-04:00] Try to connect to the primary standby02.
[W000 2024-05-08T03:48:01.285-04:00] Connection to the primary restored!
[W000 2024-05-08T03:48:03.291-04:00] Reinstating database standby01.

2024-05-08T03:48:03.291-04:00
Initiating reinstatement for database "standby01"...
Reinstating database "standby01", please wait...
[W000 2024-05-08T03:48:13.316-04:00] The standby primary is ready to be a FSFO target
Reinstatement of database "primary" succeeded
2024-05-08T03:49:47.563-04:00
[W000 2024-05-08T03:49:48.516-04:00] Successfully reinstated database primary.
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "standby01" failed
2024-05-08T03:49:54.481-04:00
[W000 2024-05-08T03:49:55.530-04:00] Reinstating database standby03.

2024-05-08T03:49:55.530-04:00
Initiating reinstatement for database "standby03"...
Reinstating database "standby03", please wait...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "standby03" failed
2024-05-08T03:50:02.022-04:00
[W000 2024-05-08T03:50:03.020-04:00] Disconnecting from database standby02.

当主库物理故障恢复后，从新进行切换回主库。

FQ

FQ 1：

DGMGRL> SHOW CONFIGURATION

Configuration - xiangxun

  Protection Mode: MaxPerformance
  Members:
  primary   - Primary database
    standby01 - Physical standby database 
    standby02 - Physical standby database 
    standby03 - Physical standby database 
      Warning: ORA-16857: member disconnected from redo source for longer than specified threshold

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 54 seconds ago)

DGMGRL>

Data Guard代理还具有可配置的数据库属性，当传输或应用滞后超过用户定义的值时，这些属性可用于生成警告：

当数据库的应用滞后超过属性指定的值时，ApplyLagThreshold可配置数据库属性会为逻辑或物理备用生成警告状态。属性值以秒为单位表示。如果值为0秒，则在存在应用滞后时不会生成任何警告。作为最佳实践，Oracle建议将ApplyLagThreshold设置为至少15分钟。
当数据库的传输滞后超过属性指定的值时，TransportLagThreshold可配置数据库属性可用于为逻辑、物理或快照备用生成警告状态。属性值以秒为单位表示。如果值为0秒，则在存在传输滞后时不会生成任何警告。作为最佳做法，Oracle建议将TransportLagThreshold设置为至少15分钟。

DGMGRL> edit database standby03 set property ApplyLagThreshold=900;
Property "applylagthreshold" updated
DGMGRL> show database verbose standby03

Database - standby03

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      5 minutes 25 seconds (computed 37 seconds ago)
  Apply Lag:          5 minutes 25 seconds (computed 37 seconds ago)
  Average Apply Rate: 1.00 KByte/s
  Active Apply Rate:  360.00 KByte/s
  Maximum Apply Rate: 360.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    orcl

  Database Warning(s):
    ORA-16855: transport lag has exceeded specified threshold
    ORA-16857: member disconnected from redo source for longer than specified threshold

  Properties:
    DGConnectIdentifier             = 'standby03'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = 'primary'
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '900'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = ''
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '0'
    LogArchiveMinSucceedDest        = '0'
    DataGuardSyncLatency            = '0'
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = ''
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'standby03'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby03)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby03_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/alert_orcl.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/drcorcl.log

Database Status:
WARNING

DGMGRL>

FQ 2：

DGMGRL> show database verbose standby03

Database - standby03

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      6 minutes 25 seconds (computed 34 seconds ago)
  Apply Lag:          6 minutes 25 seconds (computed 34 seconds ago)
  Average Apply Rate: 1.00 KByte/s
  Active Apply Rate:  360.00 KByte/s
  Maximum Apply Rate: 360.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    orcl

  Database Warning(s):
    ORA-16855: transport lag has exceeded specified threshold
    ORA-16857: member disconnected from redo source for longer than specified threshold

  Properties:
    DGConnectIdentifier             = 'standby03'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = 'primary'
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '900'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = ''
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '0'
    LogArchiveMinSucceedDest        = '0'
    DataGuardSyncLatency            = '0'
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = ''
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'standby03'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=standby03)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=standby03_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/alert_orcl.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/standby03/orcl/trace/drcorcl.log

Database Status:
WARNING

DGMGRL>

上面输出中ORA-16855的错误是正常的，可以看下要求的delay是多少：

SYS@orcl> show parameter log_archive_dest_4  

NAME				     TYPE		    VALUE
------------------------------------ ---------------------- ------------------------------
log_archive_dest_4		     string		    service="standby03", ASYNC NOA
							    FFIRM delay=0 optional compres
							    sion=disable max_failure=0 reo
							    pen=300 db_unique_name="standb
							    y03" net_timeout=30, valid_for
							    =(online_logfile,all_roles)
SYS@orcl>

在 DGMGRL 中查看

DGMGRL> show database verbose "standby03" DelayMins
  DelayMins = '0'
DGMGRL>

可以看到，要求的为 0，而实际的滞后为1分钟：

SYS@orcl> SELECT * FROM v$dataguard_stats WHERE name LIKE '%lag%';

SOURCE_DBID SOURCE_DB_UNIQUE_NAME					     NAME
----------- ---------------------------------------------------------------- ----------------------------------------------------------------
VALUE																 UNIT
-------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------
TIME_COMPUTED						     DATUM_TIME 						      CON_ID
------------------------------------------------------------ ------------------------------------------------------------ ----------
 1696144244 PRIMARY							     transport lag
+00 00:00:00															 day(2) to second(0) interval
05/08/2024 13:51:19					     05/08/2024 13:51:18						   0

 1696144244 PRIMARY							     apply lag
+00 00:00:00															 day(2) to second(0) interval
05/08/2024 13:51:19					     05/08/2024 13:51:18						   0


SYS@orcl>

修改 DelayMins 为 120 秒

DGMGRL> edit database standby03 set property DelayMins=120;

墨力计划 dgbroker

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

02.Data Guard（共14篇）

Data Guard 相关