1.故障概述
故障时间:2025-06-11T07:52:48 至 2025-06-11T07:57:48
故障实例:hzjj2
影响范围:数据库归档进程异常停止,可能导致数据无法正常归档,影响数据库可用性及容灾能力
2.故障描述
数据库告警日志显示以下错误信息:
<msg time='2025-06-11T07:52:48.560+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56689'>
<txt>ARCH: Archival stopped, error occurred. Will continue retrying</txt>
</msg>
<msg time='2025-06-11T07:52:48.560+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56689'>
<txt>ORACLE Instance hzjj2 - Archival Error</txt>
</msg>
<msg time='2025-06-11T07:52:48.560+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56689'>
<txt>ORA-16014: log 10 sequence# 55026 not archived, no available destinations
ORA-00312: online log 10 thread 2: '+REDO/hzjj/onlinelog/group_10.264.1162588875'</txt>
</msg>
<msg time='2025-06-11T07:57:48.604+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56693'>
<txt>ARCH: Archival stopped, error occurred. Will continue retrying</txt>
</msg>
<msg time='2025-06-11T07:57:48.604+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56693'>
<txt>ORACLE Instance hzjj2 - Archival Error</txt>
</msg>
<msg time='2025-06-11T07:57:48.604+08:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='hzjj-02' host_addr='10.176.8.161' module='' pid='56693'>
<txt>ORA-16038: log 10 sequence# 55026 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 10 thread 2: '+REDO/hzjj/onlinelog/group_10.264.1162588875'</txt>
</msg>
3.故障分析
1.错误代码解析
ORA-16014:归档日志序列 55026 未归档,无可用目标位置
ORA-16038:无法归档日志序列 55026
ORA-19504:创建归档文件失败
ORA-00312:指向线程 2 的在线日志组 10
2.磁盘组空间检查
故障发生时磁盘组空间情况:
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 4291533 2812212 0 2812212 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 4768370 677 0 677 0 N FRA/
MOUNTED NORMAL N 512 4096 1048576 2859 1933 953 490 0 Y OCR/
MOUNTED HIGH N 512 4096 1048576 238415 210403 95366 38345 0 N REDO/
3.关键问题定位
- FRA 磁盘组空间不足:FRA 磁盘组 Free_MB 仅 677MB,Usable_file_MB 仅 677MB,无法满足新归档日志的写入需求
- 归档日志清理不及时:大量历史归档日志未被清理,占用了 FRA 磁盘组的存储空间
- 归档路径配置问题:归档日志可能被配置存储在 FRA 磁盘组,而该磁盘组空间已耗尽
关于磁盘组空间剩余参数:
- Free_MB
磁盘组中不考虑冗余机制的原始可用空间大小,直接从 V$ASM_DISKGROUP 视图获取。 - Usable_file_MB
考虑镜像策略(如 Oracle ASM 的冗余级别)后,可用于创建新文件的实际可用空间,同样从 V$ASM_DISKGROUP 视图获取。
| 指标 | Free_MB | Usable_file_MB |
|---|---|---|
| 是否考虑冗余 | 否(原始空间) | 是(根据冗余规则扣除镜像空间) |
| 数据来源 | V$ASM_DISKGROUP.FREE_MB | V$ASM_DISKGROUP.USABLE_FILE_MB |
| 业务价值 | 衡量物理磁盘剩余容量 | 衡量可实际用于文件存储的容量 |
| 典型场景 | 磁盘扩容规划 | 新建数据库文件、表空间时的容量评估 |
查询所有磁盘组的Free_MB和Usable_file_MB
SELECT
NAME,
FREE_MB "原始可用空间(MB)",
USABLE_FILE_MB "实际可用文件空间(MB)",
CASE
WHEN REDUNDANCY = 'EXTERN' THEN '外部冗余(无镜像)'
WHEN REDUNDANCY = 'NORMAL' THEN '正常冗余(2倍镜像)'
WHEN REDUNDANCY = 'HIGH' THEN '高冗余(3倍镜像)'
ELSE REDUNDANCY
END "冗余策略"
FROM V$ASM_DISKGROUP;
通过这两个指标的结合分析,可更准确地评估 ASM 磁盘组的空间使用情况,为数据库存储规划、扩容决策提供依据。
4.故障处理
1.清理归档日志
执行以下命令清理过期归档日志:
-- 清理1天前已完成归档且已应用的日志
delete noprompt archivelog all completed before 'sysdate - 1';
-- 按线程和序列号范围清理(示例范围)
delete archivelog sequence between 127 and 180 thread 1;
-- 动态生成按线程清理命令
-- 因归档日志清理不及时,造成归档日志不连续。从V$ARCHIVED_LOG视图中筛选出满足以下条件的归档日志:
-- first_time <= systimestamp-1:归档日志的创建时间超过 1 天
-- applied='YES':归档日志已被应用到备库(如果是 Data Guard 环境)
-- deleted='NO':归档日志尚未被删除
select 'delete noprompt archivelog until sequence '||max(sequence#)||' thread '||thread#||';'
from v$archived_log
where first_time <= systimestamp-1
and applied='YES'
and deleted='NO'
group by thread#;
按线程号 (thread#) 分组,并找出每个线程中满足条件的最大序列号 (sequence#)
生成对应的DELETE ARCHIVELOG命令,格式为:
delete noprompt archivelog until sequence [最大序列号] thread [线程号];
2.清理后磁盘组空间
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 4291533 2812212 0 2812212 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 4768370 1355515 0 1355515 0 N FRA/
MOUNTED NORMAL N 512 4096 1048576 2859 1933 953 490 0 Y OCR/
MOUNTED HIGH N 512 4096 1048576 238415 210403 95366 38345 0 N REDO/
3.验证归档恢复正常
- 归档进程恢复正常运行,不再报错
- 新产生的归档日志能够正常写入 FRA 磁盘组
- 数据库告警日志中不再出现归档错误信息
5.根因分析
本次故障的根本原因是 FRA 磁盘组空间不足,导致归档日志无法正常写入。具体表现为:
- FRA 磁盘组 Usable_file_MB 仅 677MB,低于归档日志生成速度
- 归档日志清理策略未有效执行,历史归档日志积累过多
- 未及时监控磁盘组空间使用情况,未能提前发现并处理空间不足问题
6.预防措施
1.完善磁盘空间监控:
- 设置 FRA 磁盘组空间使用率阈值告警(建议 80%)
- 定期检查各磁盘组的 Free_MB 和 Usable_file_MB
- 优化归档日志管理:
- 制定合理的归档日志保留策略,根据备份频率调整保留时间
- 配置自动清理脚本,定期清理过期归档日志
- 考虑将归档日志存储在独立的磁盘组,避免与 FRA 竞争空间
- 定期维护与检查:
- 每周检查归档日志清理情况,确保归档日志序列连续
- 每月分析磁盘组空间使用趋势,提前规划扩容
- 定期测试归档和恢复流程,确保归档功能正常
7.参考文档
- Negative Value Of Usable File MB Space In DiskGroup (Doc ID 2338336.1)
- ASMCMD Disk Group Management Commands (oracle.com)
- Oracle Database Backup and Recovery Basics (官方文档)
Fix
Check the free space at the archive destination and after making sufficient space at archive destination, re-try to open the database and it will open and no need to issue any other command like clear log, switch log.
清理归档日志
delete noprompt archivelog all completed before 'sysdate - 1';
or
delete archivelog sequence between 127 and 180 thread 1;
select 'delete noprompt archivelog until sequence '||max(sequence#)||' thread '||thread#||';' from v$archived_log where first_time <=systimestamp-1 and applied='YES' and deleted='NO' group by thread#;




