暂无图片
暂无图片
11
暂无图片
暂无图片
暂无图片

Oracle ADG 同步状态与延迟巡检SQL SOP

原创 孙莹 2026-06-08
501

Oracle ADG 同步状态与延迟巡检SQL SOP

适用范围:Oracle Data Guard / Active Data Guard 日常巡检、故障判断、恢复后验证。

适用版本:Oracle 11g / 12c / 18c / 19c 物理备库。

一、文档目标

本文档用于检查 Oracle ADG 是否同步正常、是否存在传输延迟、应用延迟、归档缺口,以及主库到备库的日志传输目标是否异常。

巡检最终应输出以下结论之一:

结论 含义
正常 无 archive gap,transport lag=0 或秒级,apply lag=0 或秒级,MRP/RFS 正常
有传输延迟 主库 redo 未及时传到备库,transport lag 较大
有应用延迟 redo 已到备库,但 MRP 应用慢,apply lag 较大
有归档缺口 v$archive_gap 有记录,备库缺少中间归档
MRP 未启动 没有 MRP0,或备库只读但未应用
主库传输异常 主库 v$archive_dest / v$archive_dest_status 有 ERROR

二、总体判断逻辑

ADG 巡检不能只看一个视图,要从以下几个维度综合判断:

检查维度 关键视图 正常状态
备库角色 v$database DATABASE_ROLE=PHYSICAL STANDBY
备库打开模式 v$database MOUNTEDREAD ONLY WITH APPLY
日志接收 v$managed_standby / v$dataguard_process RFS,且对应主库 LGWR/ARCH 连接
日志应用 v$managed_standby / v$dataguard_process MRP0,状态为 APPLYING_LOGWAIT_FOR_LOG
传输延迟 v$dataguard_stats transport lag 为 0 或秒级
应用延迟 v$dataguard_stats apply lag 为 0 或秒级
归档缺口 v$archive_gap no rows selected
sequence 差距 v$archived_log 每个 thread#seq_lag=0 或短时间可追平
主库传输目标 v$archive_dest / v$archive_dest_status STATUS=VALIDERROR 为空

注意:Active Data Guard 实时应用场景下,v$archived_log.applied 可能略滞后于 standby redo log 的实时应用位置,因此需要结合 v$managed_standbyv$dataguard_statsv$archive_gap 一起判断。

三、备库侧巡检 SQL

巡检前 SQL*Plus 格式设置,建议每次巡检前先执行:

set lines 300 pages 1000 set long 20000 set trimspool on set tab off col database_role for a20 col open_mode for a25 col protection_mode for a25 col switchover_status for a25 col process for a20 col status for a20 col client_process for a20 col thread# for 999 col sequence# for 999999999 col block# for 999999999 col name for a30 col value for a30 col unit for a25 col time_computed for a30 col datum_time for a30 col dest_name for a20 col type for a10 col database_mode for a20 col recovery_mode for a20 col synchronization_status for a20 col synchronized for a10 col gap_status for a20 col db_unique_name for a20 col dest_name for a30 col destination for a80 col error for a100

以下 SQL 优先在备库执行。

3.1 检查备库角色和打开状态

select database_role, open_mode, protection_mode, switchover_status from v$database;

正常参考:

DATABASE_ROLE OPEN_MODE PROTECTION_MODE SWITCHOVER_STATUS -------------------- ------------------------- ------------------------- ------------------------- PHYSICAL STANDBY MOUNTED MAXIMUM PERFORMANCE NOT ALLOWED

或者:

PHYSICAL STANDBY READ ONLY WITH APPLY MAXIMUM PERFORMANCE NOT ALLOWED

判断标准:

字段 正常/异常判断
DATABASE_ROLE 正常应为 PHYSICAL STANDBY
OPEN_MODE=MOUNTED 备库挂载状态,可以正常应用日志
OPEN_MODE=READ ONLY WITH APPLY ADG 可读且实时应用,正常
OPEN_MODE=READ ONLY 只读打开但未应用,需要检查 MRP 是否启动
OPEN_MODE=READ WRITE 不是正常物理备库状态,需要确认是否已 failover/switchover

3.2 检查 RFS / MRP 进程状态

select process, status, thread#, sequence#, block#, client_process from v$managed_standby order by process, thread#;

正常参考:

PROCESS STATUS THREAD# SEQUENCE# BLOCK# CLIENT_PROCESS ------------------ ------------------------ ------- ---------- ---------- ---------------- MRP0 APPLYING_LOG 2 62398 27749 N/A RFS IDLE 1 96120 239800 LGWR RFS IDLE 2 62398 27754 LGWR

判断标准:

进程/状态 含义 判断
RFS 存在 备库正在接收主库 redo 正常
CLIENT_PROCESS=LGWR 主库 LGWR 直接传输 redo 正常
CLIENT_PROCESS=Archival 归档方式传输 正常
MRP0 APPLYING_LOG 备库正在应用日志 正常
MRP0 WAIT_FOR_LOG 已追平,等待新日志 正常
没有 MRP0 日志应用未启动 异常
MRP0 WAIT_FOR_GAP 缺少归档日志 异常
MRP0 的 sequence# / block# 长时间不变 可能卡住或 I/O 慢 需排查

注意:RFS IDLE 不一定异常,可能表示当前没有新的 redo 或已经接收完当前 redo。

3.3 检查 ADG 传输延迟和应用延迟

select name, value, unit, time_computed, datum_time from v$dataguard_stats where name in ('transport lag', 'apply lag', 'apply finish time', 'estimated startup time');

正常参考:

NAME VALUE TIME_COMPUTED DATUM_TIME ------------------------------ ------------------------------ ------------------------------ ------------------------------ transport lag +00 00:00:00 05/15/2026 21:10:06 05/15/2026 21:10:06 apply lag +00 00:00:00 05/15/2026 21:10:06 05/15/2026 21:10:06 apply finish time +00 00:00:00.000 05/15/2026 21:10:06

判断标准:

指标 含义 正常判断
transport lag 主库 redo 传到备库的延迟 0 或秒级
apply lag 备库应用 redo 的延迟 0 或秒级
apply finish time 预计追平还需要多久 0 或很短
datum_time 统计数据采样时间 应接近当前时间

异常示例:

apply lag +92 07:52:39

说明备库应用延迟已经超过 92 天,属于严重不同步。

注意:如果存在 archive gap,transport lagapply finish time 可能不准确,应优先以 v$archive_gap 为准。

3.4 检查是否存在归档缺口

select * from v$archive_gap;

正常结果:

no rows selected

异常示例:

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# CON_ID ------- ------------- -------------- ---------- 1 88228 94296 1 2 56221 60831 1

含义:

Thread 缺失范围 说明
thread 1 88228 ~ 94296 缺少 thread 1 的中间归档
thread 2 56221 ~ 60831 缺少 thread 2 的中间归档

只要 v$archive_gap 有记录,就不能认为 ADG 同步正常。

3.5 检查每个 thread 的接收和应用差距

select thread#, max(sequence#) as max_received_seq, max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq, max(sequence#) - max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag from v$archived_log where resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#;

正常参考:

THREAD# MAX_RECEIVED_SEQ MAX_APPLIED_SEQ SEQ_LAG ------- ---------------- --------------- ---------- 1 96119 96119 0 2 62397 62397 0

判断标准:

seq_lag 判断
0 已追平
1~2 可能是实时应用/归档标记滞后,需要结合 v$managed_standby 判断
持续增长 应用延迟或归档未连续应用
某个 thread 差距大 RAC 某个线程存在传输或应用问题

3.6 检查最近应用到的时间点

select thread#, max(sequence#) as last_applied_seq, to_char(max(next_time), 'yyyy-mm-dd hh24:mi:ss') as last_applied_time from v$archived_log where applied in ('YES','IN-MEMORY') and resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#;

输出示例:

THREAD# LAST_APPLIED_SEQ LAST_APPLIED_TIME ------- ---------------- -------------------- 1 96119 2026-05-15 21:09:44 2 62397 2026-05-15 21:09:44

用途:

  • 确认备库数据大致应用到哪个业务时间点;
  • apply lag 异常时,用于判断落后时间范围;
  • RAC 环境下分别确认每个 thread 的应用时间。

3.7 查看最近归档日志应用明细

select thread#, sequence#, applied, to_char(first_time, 'yyyy-mm-dd hh24:mi:ss') as first_time, to_char(next_time, 'yyyy-mm-dd hh24:mi:ss') as next_time, to_char(completion_time, 'yyyy-mm-dd hh24:mi:ss') as completion_time from ( select a.*, row_number() over(partition by thread# order by sequence# desc) rn from v$archived_log a where resetlogs_change# = (select resetlogs_change# from v$database) ) where rn <= 10 order by thread#, sequence# desc;

用途:

  • 查看每个 thread 最近 10 个归档日志是否已应用;
  • 检查 APPLIED 是否为 YESIN-MEMORY
  • 辅助判断某个 thread 是否落后。

四、主库侧巡检 SQL

巡检前 SQL*Plus 格式设置,建议每次巡检前先执行:

set lines 300 pages 1000 set long 20000 set trimspool on set tab off col database_role for a20 col open_mode for a25 col protection_mode for a25 col switchover_status for a25 col process for a20 col status for a20 col client_process for a20 col thread# for 999 col sequence# for 999999999 col block# for 999999999 col name for a30 col value for a30 col unit for a25 col time_computed for a30 col datum_time for a30 col dest_name for a20 col type for a10 col database_mode for a20 col recovery_mode for a20 col synchronization_status for a20 col synchronized for a10 col gap_status for a20 col db_unique_name for a20 col dest_name for a30 col destination for a80 col error for a100

以下 SQL 在主库执行,用于确认主库到备库的日志传输目标是否正常。

4.1 检查主库归档目标状态

select dest_id, dest_name, status, type, database_mode, recovery_mode, db_unique_name, synchronization_status, synchronized, gap_status, error from v$archive_dest_status where status <> 'INACTIVE' and type = 'PHYSICAL' order by dest_id;

正常参考:

DEST_ID STATUS TARGET DATABASE_MODE RECOVERY_MODE ERROR ------- ------- ------- ---------------- ------------------------- ----- 2 VALID STANDBY OPEN_READ-ONLY MANAGED REAL TIME APPLY

判断标准:

字段 正常判断
STATUS VALID
TARGET STANDBY
RECOVERY_MODE MANAGED REAL TIME APPLYMANAGED
ERROR 为空

5.2 检查主库归档目标配置和错误信息

select dest_id, dest_name, status, destination, error from v$archive_dest where status <> 'INACTIVE' and destination is not null order by dest_id;

正常参考:

DEST_ID DEST_NAME STATUS DESTINATION ERROR ------- --------------------- -------- ----------------- ----- 1 LOG_ARCHIVE_DEST_1 VALID USE_DB_RECOVERY_FILE_DEST 2 LOG_ARCHIVE_DEST_2 VALID standby_tns

如果 ERROR 不为空,需要优先处理主库日志传输问题。

5.3 主库查看当前归档 sequence

select thread#, max(sequence#) as current_archived_seq from v$archived_log where resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#;

用途:

  • 和备库 max_received_seq 对比;
  • 判断主库已经生成到哪个归档 sequence;
  • RAC 环境下必须分别看每个 thread#

5.4 主库查看当前 online redo sequence

select thread#, group#, sequence#, status, archived from v$log order by thread#, group#;

用途:

  • 查看主库当前正在写哪个 online redo;
  • 和备库 RFS/MRP 当前 sequence 对比;
  • 判断实时应用是否接近当前日志。

五、一键汇总巡检 SQL

以下 SQL 建议保存为 adg_check.sql,在备库执行。

set lines 300 pages 1000 set trimspool on set tab off prompt ====================================================================== prompt 1. DATABASE ROLE / OPEN MODE prompt ====================================================================== col database_role for a20 col open_mode for a25 col protection_mode for a25 col switchover_status for a25 select database_role, open_mode, protection_mode, switchover_status from v$database; prompt ====================================================================== prompt 2. MANAGED STANDBY PROCESS prompt ====================================================================== col process for a18 col status for a24 col client_process for a16 select process, status, thread#, sequence#, block#, client_process from v$managed_standby order by process, thread#; prompt ====================================================================== prompt 3. DATAGUARD LAG prompt ====================================================================== col name for a30 col value for a30 col unit for a25 col time_computed for a30 col datum_time for a30 select name, value, unit, time_computed, datum_time from v$dataguard_stats where name in ('transport lag', 'apply lag', 'apply finish time', 'estimated startup time'); prompt ====================================================================== prompt 4. ARCHIVE GAP prompt ====================================================================== select * from v$archive_gap; prompt ====================================================================== prompt 5. RECEIVED/APPLIED SEQUENCE LAG BY THREAD prompt ====================================================================== col max_received_seq for 999999999 col max_applied_seq for 999999999 col seq_lag for 999999999 select thread#, max(sequence#) as max_received_seq, max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq, max(sequence#) - max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag from v$archived_log where resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#; prompt ====================================================================== prompt 6. LAST APPLIED TIME BY THREAD prompt ====================================================================== col last_applied_time for a20 select thread#, max(sequence#) as last_applied_seq, to_char(max(next_time), 'yyyy-mm-dd hh24:mi:ss') as last_applied_time from v$archived_log where applied in ('YES','IN-MEMORY') and resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#;

执行方式:

sqlplus / as sysdba @adg_check.sql

如需保存日志:

spool /tmp/adg_check_&&_DATE.log @adg_check.sql spool off

六、异常场景判断与处理 SOP

6.1 场景一:v$archive_gap 有记录

异常表现:

select * from v$archive_gap;

输出类似:

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# ------- ------------- -------------- 1 88228 94296 2 56221 60831

判断:

备库缺少中间归档,MRP 无法连续恢复到最新位置。

处理路径:

  1. 在主库确认缺失归档是否还存在;
  2. 如果归档存在,复制到备库并注册;
  3. 如果归档不存在或缺口太大,使用 RMAN 增量推进备库;
  4. 重启 MRP;
  5. 重新验证 v$archive_gapapply lagseq_lag

主库查询缺失归档是否存在:

select thread#, sequence#, name, deleted, status, to_char(first_time, 'yyyy-mm-dd hh24:mi:ss') first_time, to_char(next_time, 'yyyy-mm-dd hh24:mi:ss') next_time from v$archived_log where resetlogs_change# = (select resetlogs_change# from v$database) and ( (thread# = 1 and sequence# between 88228 and 94296) or (thread# = 2 and sequence# between 56221 and 60831) ) order by thread#, sequence#;

备库注册归档:

alter database recover managed standby database cancel; alter database register physical logfile '/u01/arch_gap/thread1/1_88228_xxx.arc'; alter database register physical logfile '/u01/arch_gap/thread2/2_56221_xxx.arc'; alter database recover managed standby database using current logfile disconnect from session;

批量生成注册脚本:

find /u01/arch_gap -type f | sort | awk '{print "alter database register physical logfile '\''" $0 "'\'';"}' > /tmp/register_arch.sql sqlplus / as sysdba @/tmp/register_arch.sql

6.2 场景二:归档缺口太大或归档已丢失

适用情况:

  • 缺失归档数量很多;
  • 主库 FRA 已清理;
  • RMAN 备份中恢复归档成本过高;
  • 手工补归档时间太长。

推荐处理:使用 RMAN 增量推进备库。

方式 A:18c/19c 从主库服务恢复备库

备库执行:

rman target /
recover standby database from service '主库TNS服务名';

执行前确认备库能连接主库:

tnsping 主库TNS服务名 sqlplus sys/密码@主库TNS服务名 as sysdba

方式 B:传统 RMAN incremental from SCN

备库查询最低 SCN:

select min(checkpoint_change#) as standby_min_scn from v$datafile_header;

主库基于该 SCN 做增量备份:

run { allocate channel c1 device type disk; allocate channel c2 device type disk; backup incremental from scn <standby_min_scn> database format '/u01/standby_incr/standby_incr_%U.bkp'; release channel c1; release channel c2; } backup current controlfile for standby format '/u01/standby_incr/standby_control_%U.bkp';

拷贝到备库:

scp /u01/standby_incr/* oracle@备库主机:/u01/standby_incr/

备库恢复:

rman target /
startup force nomount; restore standby controlfile from '/u01/standby_incr/standby_control_xxx.bkp'; alter database mount; catalog start with '/u01/standby_incr/'; recover database noredo;

恢复完成后启动 MRP:

alter database recover managed standby database using current logfile disconnect from session;

6.3 场景三:没有 MRP0

异常表现:

v$managed_standby 中没有 MRP0

处理:

alter database recover managed standby database using current logfile disconnect from session;

如果不使用实时应用:

alter database recover managed standby database disconnect from session;

再次检查:

select process, status, thread#, sequence#, block#, client_process from v$managed_standby order by process, thread#;

正常应看到:

MRP0 APPLYING_LOG

或:

MRP0 WAIT_FOR_LOG

6.4 场景四:主库传输目标报错

主库检查:

select dest_id, dest_name, status, destination, error from v$archive_dest where target = 'STANDBY' order by dest_id;

如果 ERROR 不为空,按以下方向排查:

问题方向 检查内容
网络 主备监听、端口、主机名解析、防火墙
TNS tnspingtnsnames.ora、service_name
密码文件 主备 SYS 密码、orapw 文件是否一致
归档目标 log_archive_dest_n 配置是否正确
备库状态 备库实例是否启动到 mount/open 状态
FRA/磁盘空间 主库或备库归档目录是否满

常用测试:

tnsping standby_tns sqlplus sys/密码@standby_tns as sysdba

6.5 场景五:传输正常但应用慢

现象:

  • RFS 正常;
  • transport lag 不大;
  • apply lag 持续增长;
  • max_received_seq > max_applied_seq

处理方向:

  1. 检查备库 I/O 性能;
  2. 检查备库 CPU / 内存压力;
  3. 检查是否有大量 redo 需要应用;
  4. 检查 standby redo log 是否合理;
  5. 检查 alert 日志是否有 MRP 报错。

操作系统检查:

top vmstat 1 10 iostat -x 1 10

重点关注:

指标 异常判断
%util 接近 100%,磁盘忙
await 明显升高,I/O 响应慢
%iowait 长时间较高,系统等待 I/O
MRP 进程 CPU 长时间高,说明正在追日志;长时间无变化则可能卡住

七、恢复后验证 SOP

ADG 异常处理完成后,必须连续验证以下项目。

7.1 验证无归档缺口

select * from v$archive_gap;

正常:

no rows selected

7.2 验证延迟归零

select name, value, time_computed, datum_time from v$dataguard_stats where name in ('transport lag','apply lag','apply finish time');

正常:

transport lag +00 00:00:00 apply lag +00 00:00:00 apply finish time +00 00:00:00.000

7.3 验证每个 thread 追平

select thread#, max(sequence#) as max_received_seq, max(case when applied in ('YES','IN-MEMORY') then sequence# end) as max_applied_seq, max(sequence#) - max(case when applied in ('YES','IN-MEMORY') then sequence# end) as seq_lag from v$archived_log where resetlogs_change# = (select resetlogs_change# from v$database) group by thread# order by thread#;

正常:

THREAD# MAX_RECEIVED_SEQ MAX_APPLIED_SEQ SEQ_LAG ------- ---------------- --------------- ------- 1 96119 96119 0 2 62397 62397 0

7.4 验证 MRP/RFS 状态

select process, status, thread#, sequence#, block#, client_process from v$managed_standby order by process, thread#;

正常:

MRP0 APPLYING_LOG 或 WAIT_FOR_LOG RFS IDLE / RECEIVING,CLIENT_PROCESS 为 LGWR 或 Archival

7.5 建议观察周期

只有持续满足以下条件,才能认为恢复稳定:

  1. v$archive_gap = no rows selected
  2. transport lag = 0 或秒级
  3. apply lag = 0 或秒级
  4. 每个 thread# 的 seq_lag = 0 或短时间可追平
  5. MRP0/RFS 状态正常
  6. 主库传输目标无 ERROR

八、常用命令速查

8.1 启动实时应用

alter database recover managed standby database using current logfile disconnect from session;

8.2 停止日志应用

alter database recover managed standby database cancel;

8.3 启动非实时应用

alter database recover managed standby database disconnect from session;

8.4 主库强制归档当前日志

alter system archive log current;

8.5 主库切换日志

alter system switch logfile;

8.6 备库注册单个归档

alter database register physical logfile '/path/to/archive.arc';

8.7 查看 alert 日志

adrci show homes set home diag/rdbms/<db_unique_name>/<instance_name> show alert -tail 200

或:

tail -200f $ORACLE_BASE/diag/rdbms/*/*/trace/alert_*.log
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论