丢失非活动日志组的故障恢复
如果数据库丢失的是非活动(INACTIVE)日志组,由于非活动日志组已经完成检查点,数据库不会发生数据损失,此时只需要通过Clear重建该日志组即可恢复。
首先删除一个非活动日志组,模拟一次故障损失:
SQL> ! rm /opt/oracle/oradata/eygle/redo02.log
如果数据库日志切换,使用到该日志组,则数据库可能马上崩溃:
SQL> alter system switch logfile; alter system switch logfile * ERROR at line 1: ORA-03113: end-of-file on communication channel
我们可以从警告日志中获得部分详细信息:
Wed May 10 11:06:49 2006 Errors in file /opt/oracle/admin/eygle/bdump/eygle_lgwr_31539.trc: ORA-00313: open failed for members of log group 2 of thread 1 ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/eygle/redo02.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3
此时启动数据库,数据库会提示日志丢失:
SQL> startup ORACLE instance started. Total System Global Area 252777592 bytes Fixed Size 451704 bytes Variable Size 134217728 bytes Database Buffers 117440512 bytes Redo Buffers 667648 bytes Database mounted. ORA-00313: open failed for members of log group 2 of thread 1 ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/eygle/redo02.log'
此时在Mount状态我们可以查看各日志组及日志文件的状态:
SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# ------ ---------- --------- ---------- ---------- --- ---------------- ------------- 1 1 8 10485760 1 NO INACTIVE 485702 2 1 0 10485760 1 NO UNUSED 0 3 1 9 10485760 1 NO INVALIDATED 485719 SQL> select * from v$logfile; GROUP# STATUS TYPE MEMBER ---------- ------- ------- ---------------------------------------- 1 ONLINE /opt/oracle/oradata/eygle/redo01.log 2 ONLINE /opt/oracle/oradata/eygle/redo02.log 3 STALE ONLINE /opt/oracle/oradata/eygle/redo03.log
我们注意到,由于日志组2 已经损失,在日志切换过程中,数据库Crash,所以日志组3的状态变为INVALIDATED,日志文件redo03.log的状态变为STALE(STALE通常出现在上一次操作失败之后,在下一次成功操作后状态会恢复正常)。
我们清除该日志组后即可启动数据库:
SQL> alter database clear logfile group 2; Database altered. SQL> alter database open; Database altered.
注意,如果数据库处于归档模式下,并且该日志组未完成归档则需要使用如下命令强制清除(关于此种情况,请参考本章后面相关诊断案例):
alter database clear unarchived logfile group 2;
打开数据库之后,状态为STALE的日志文件,在下次正常写入后,状态即可恢复正常:
SQL> select * from v$log; GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# ------ ---------- --------- ---------- ---------- --- ---------------- ------------- 1 1 8 10485760 1 NO INACTIVE 485702 2 1 10 10485760 1 NO CURRENT 505721 3 1 9 10485760 1 NO INACTIVE 485719 SQL> select * from v$logfile; GROUP# STATUS TYPE MEMBER ------ ------- ------- ---------------------------------------- 1 ONLINE /opt/oracle/oradata/eygle/redo01.log 2 ONLINE /opt/oracle/oradata/eygle/redo02.log 3 STALE ONLINE /opt/oracle/oradata/eygle/redo03.log SQL> alter system switch logfile; System altered. SQL> alter system switch logfile; System altered. SQL> select * from v$logfile; GROUP# STATUS TYPE MEMBER ------ ------- ------- ---------------------------------------- 1 ONLINE /opt/oracle/oradata/eygle/redo01.log 2 ONLINE /opt/oracle/oradata/eygle/redo02.log 3 ONLINE /opt/oracle/oradata/eygle/redo03.log
丢失活动或当前日志文件的恢复
我们知道Oracle通过日志文件保证提交成功的数据不丢失。可是我们注意,在故障中,我们可能损失了当前的(CURRENT)日志文件。
这又分为两种情况:
1. 如果数据库是正常关闭的
由于关闭数据库前,Oracle会执行全面检查点,当前日志在实例恢复中可以不再需要。在Oracle8i中我们可以通过与4.9.1中描述类似方法进行解决,收录简要测试步骤如下(非归档模式下测试过程):
SQL> select * from v$version where rownum <2; BANNER ---------------------------------------------------------------- Oracle8i Enterprise Edition Release 8.1.7.4.0 - 64bit Production SQL> select * from v$logfile; GROUP# STATUS MEMBER ---------- ------- ---------------------------------------- 1 /opt/oracle/oradata/testora8/redo01.log 2 /opt/oracle/oradata/testora8/redo02.log 3 /opt/oracle/oradata/testora8/redo03.log SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> ! mv /opt/oracle/oradata/testora8/redo* /tmp SQL> startup ORACLE instance started. Total System Global Area 338390716 bytes Fixed Size 102076 bytes Variable Size 133308416 bytes Database Buffers 204800000 bytes Redo Buffers 180224 bytes Database mounted. ORA-00313: open failed for members of log group 1 of thread 1 ORA-00312: online log 1 thread 1: '/opt/oracle/oradata/testora8/redo01.log' SQL> alter database clear logfile group 1; Database altered. SQL> alter database clear logfile group 2; Database altered. SQL> alter database clear logfile group 3; Database altered. SQL> alter database open; Database altered.
在Oracle9i中,可能无法对当前日志进行Clear,需要通过Until Cancel恢复后,Resetlogs打开,以下是一个简单的测试过程:
SQL> ! rm /opt/oracle/oradata/eygle/redo0* SQL> startup ORACLE instance started. Total System Global Area 252777592 bytes Fixed Size 451704 bytes Variable Size 134217728 bytes Database Buffers 117440512 bytes Redo Buffers 667648 bytes Database mounted. ORA-00313: open failed for members of log group 1 of thread 1 ORA-00312: online log 1 thread 1: '/opt/oracle/oradata/eygle/redo01.log' SQL> alter database clear logfile group 1; Database altered. SQL> alter database clear unarchived logfile group 2; alter database clear unarchived logfile group 2 * ERROR at line 1: ORA-00313: open failed for members of log group 2 of thread 1 ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/eygle/redo02.log' ORA-27037: unable to obtain file status Linux Error: 2: No such file or directory Additional information: 3 SQL> recover database until cancel; Media recovery complete. SQL> alter database open resetlogs; Database altered.
2. 在损失当前日志时,如果数据库是异常关闭的
那么Oracle在进行实例恢复时必须要求当前日志,否则Oracle将无法保证提交成功的数据不丢失(也就意味着,Oracle会丢失数据),在这种情况下,Oracle数据库将无法启动。
对于这种情况,我们通常需要从备份中恢复数据文件,通过应用归档日志文件向前推演,直到最后一个完好的日志文件,然后可以通过resetlogs启动数据库完成恢复。丢失的数据就是损坏的日志文件中的数据。
这种恢复方式与以上介绍的方法类似,不再赘述。
可是不幸的是,很多数据库是从不备份的,那么在面对这种情况时,Oracle提供给我们一种内部手段可以用于强制性数据库打开,忽略一致性等问题,在打开数据库之后,Oracle建议导出(exp)数据,然后重建数据库,再导入(imp)数据,完成灾难恢复。
在继续之前,我愿意提及一下我经常强调的DBA四大守则之一:
备份重于一切
要知道系统总是要崩溃的,没有有效的备份只是等哪一天死而已。
如果说有什么事可以让DBA在深夜惊醒的话,那就是“没有备份”。
如果回忆一下,我们在第三章曾经提到过Oracle有一类具有特殊作用的隐含参数,其中一个参数是:_allow_resetlogs_corruption。
我们看一下这个参数的说明:
SQL> select ksppinm,ksppdesc from x$ksppi 2 where ksppinm like '%resetlogs_%'; KSPPINM KSPPDESC ------------------------------ -------------------------------------------------- _allow_resetlogs_corruption allow resetlogs even if it will cause corruption
该参数的含义是,允许在破坏一致性的情况下强制重置日志,打开数据库。_ALLOW_RESETLOGS_CORRUPTION将使用所有数据文件最旧的SCN打开数据库,所以通常需要保证SYSTEM表空间拥有最旧的SCN。
在强制打开数据库之后,可能因为各种原因伴随出现ORA-600错误,有些可以依据常规途径解决。我们看一下下面的一个案例。
在数据库启动时出现错误,提示日志文件损坏:
SQL> startup ; ORACLE instance started. Total System Global Area 97588504 bytes Fixed Size 451864 bytes Variable Size 33554432 bytes Database Buffers 62914560 bytes Redo Buffers 667648 bytes Database mounted. ORA-00354: corrupt redo log block header ORA-00353: log corruption near block 3 change 897612314 time 10/19/2005 14:19:34 ORA-00312: online log 3 thread 1: '/opt/oracle/oradata/conner/redo03.log'
在Mount状态,可以查询v$log视图,发现此处损坏的是active的日志文件:
SQL> select * from v$log; GROUP#THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM ----- ------- ---------- ---------- ---------- --- ------ ------------- --------- 1 1 159 10485760 1 NO INACTIVE 897592312 19-OCT-05 2 1 158 10485760 1 NO INACTIVE 897572310 19-OCT-05 3 1 160 10485760 1 NO ACTIVE 897612314 19-OCT-05 4 1 161 1048576 1 NO CURRENT 897612440 19-OCT-05
由于Active日志未完成检查点,在恢复中需要用到,丢失Active日志和Current日志情况类似,如果没有备份,我们只好使用隐含参数_allow_resetlogs_corruption强制启动数据库,设置此参数之后,在数据库Open过程中,Oracle会跳过某些一致性检查,从而使数据库可能跳过不一致状态,Open打开:
SQL> alter system set "_allow_resetlogs_corruption"=true scope=spfile;
System altered.
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 897612315 generated at 10/19/2005 16:54:18 needed for thread 1
ORA-00289: suggestion : /opt/oracle/oradata/conner/archive/1_160.dbf
ORA-00280: change 897612315 for thread 1 is in sequence #160
Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/conner/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
Database altered.
SQL> shutdown immediate;
SQL> startup
ORACLE instance started.
Total System Global Area 97588504 bytes
Fixed Size 451864 bytes
Variable Size 33554432 bytes
Database Buffers 62914560 bytes
Redo Buffers 667648 bytes
Database mounted.
Database opened.幸运的时候数据库就可以成功Open,如果不幸可能会遇到一系列的Ora-600错误(最常见的是2662错误)此时就需要使用多种手段继续进行调整恢复。
如果注意观察alert日志,我们可能会发现类似以下日志:
Fri Jun 10 16:30:25 2005 alter database open resetlogs Fri Jun 10 16:30:25 2005 RESETLOGS is being done without consistancy checks. This may result in a corrupted database. The database should be recreated. RESETLOGS after incomplete recovery UNTIL CHANGE 240677200 Resetting resetlogs activation ID 3171937922 (0xbd0fee82)
不一致恢复最后恢复到的Change号是:240677200
Oracle告诉我们,强制resetlogs跳过了一致性检查,可能导致数据库损坏,数据库应当重建。而且此方法应该在Oracle技术支持的指导之下进行,否则Oracle将不对采用此类方式进行恢复的数据库进行支持。
注意:DBA需要时刻铭记的一个工作习惯是,在重要操作或故障处理前,保留现场。也就是说在进行以上类似恢复等工作前,我们应当对数据库进行冷备份,这样在恢复尝试失败后,也仍然可以回退到之前的状态。




