问题描述
下面测试来至于一个网友,由于重建控制文件少写UNDO表空间信息,最后使用resetlogs方式打开数据库,出现了MISSING00005的数据文件。其实undo表空间出现数据文件丢失很好处理的,切换一个新的UNDO表空间后,可以直接删除源旧的UNDO表空间,如果删除不掉的时候,增加几个参数或者是修改一个UNDO$就可以解决。但是网友环境不能正常删除,由于当时在处理其它事情,没有远程,不知道具体什么原因。实在删除不掉,我们也可以构建一个数据文件出来,就可以处理轻松的处理ORA-01178。
1,测试版本
www.htz.pw > select * from v$version where rownum<3; BANNER —————————————————————- Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi PL/SQL Release 10.2.0.5.0 – Production www.htz.pw > !lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 4.8 (Tikanga) Release: 4.8 Codename: Tikanga
2,故障现象
www.htz.pw > alter database create datafile 5 as ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’; alter database create datafile 5 as ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ * ERROR at line 1: ORA-01178: file 5 created before last CREATE CONTROLFILE, cannot recreate ORA-01110: data file 5: ‘/oracle/app/oracle/product/10.2.0/db_1024/dbs/MISSING00005’
这里出现了ORA-01178的错误
专家解答
3,查看file$获取基本信息
如果数据库不能正常open的时候,我们可以使用bbed去查看file$的内容,或者是使用odu等工具来实现
www.htz.pw > select FILE#,STATUS$,BLOCKS,TS#,RELFILE#,MAXEXTEND,INC,CRSCNWRP,CRSCNBAS,OWNERINSTANCE from file$ 2 ; FILE# STATUS$ BLOCKS TS# RELFILE# MAXEXTEND INC CRSCNWRP CRSCNBAS OWNERINSTANCE ———- ———- ———- ———- ———- ———- ———- ———- ———- —————————— 1 2 38400 0 1 4194302 1280 0 5 2 2 1280 6 2 0 0 0 2250643 3 2 15360 2 3 4194302 1280 0 6678 4 2 640 4 4 4194302 160 0 10685 5 2 1280 1 5 0 0 0 2310987
这里我选择文件2来构建文件5,这里我们需要注意的几个地方是,文件块的个数,scn等信息,其实这里我们也可以新创建一个大小相等的文件来实现
4,bbed修改内容
需要修改的内容见下面
BBED> p kcvfh struct kcvfh, 860 bytes @0 @0 struct kcvfhbfh, 20 bytes @0 @0 ub1 type_kcbh @0 0x0b @0 0x0b ub1 frmt_kcbh @1 0xa2 @1 0xa2 ub1 spare1_kcbh @2 0x00 @2 0x00 ub1 spare2_kcbh @3 0x00 @3 0x00 ub4 rdba_kcbh @4 0x01800001 @4 0x01c00001 relative database block address(个人感觉这个值由64*kccfhfno+0001) ub4 bas_kcbh @8 0x00000000 @8 0x00000000 ub2 wrp_kcbh @12 0x0000 @12 0x0000 ub1 seq_kcbh @14 0x01 @14 0x01 ub1 flg_kcbh @15 0x04 (KCBHFCKV) @15 0x04 (KCBHFCKV) ub2 chkval_kcbh @16 0x90e2 @16 0x907e ub2 spare3_kcbh @18 0x0000 @18 0x0000 struct kcvfhhdr, 76 bytes @20 @20 ub4 kccfhswv @20 0x00000000 @20 0x00000000 ub4 kccfhcvn @24 0x0b200000 @24 0x0b200000 ub4 kccfhdbi @28 0xd0278802 @28 0xd0278802 text kccfhdbn[0] @32 O @32 O text kccfhdbn[1] @33 R @33 R text kccfhdbn[2] @34 C @34 C text kccfhdbn[3] @35 L @35 L text kccfhdbn[4] @36 1 @36 1 text kccfhdbn[5] @37 1 @37 1 text kccfhdbn[6] @38 2 @38 2 text kccfhdbn[7] @39 3 @39 3 ub4 kccfhcsq @40 0x00001711 @40 0x00001715 ub4 kccfhfsz @44 0x00000500 @44 0x00000500 文件的大小,其实是块的个数(数据文件大小(ls -l)除块的大小再减去1),可以直接dump控制文件 s_blkz kccfhbsz @48 0x00 @48 0x00 ub2 kccfhfno @52 0x0006 @52 0x0007 file number,这里是绝对文件号,可以通过底层表得到 ,也可以在控制文件中得到 ub2 kccfhtyp @54 0x0003 @54 0x0003 file type,03代表数据文件,01,代表控制文件,02代表redo log ,04,backup controlfile 5 backup file,6 temporary db file ub4 kccfhacid @56 0x00000000 @56 0x00000000 ub4 kccfhcks @60 0x00000000 @60 0x00000000 text kccfhtag[0] @64 @64 text kccfhtag[1] @65 @65 text kccfhtag[2] @66 @66 text kccfhtag[3] @67 @67 text kccfhtag[4] @68 @68 text kccfhtag[5] @69 @69 text kccfhtag[6] @70 @70 text kccfhtag[7] @71 @71 text kccfhtag[8] @72 @72 text kccfhtag[9] @73 @73 text kccfhtag[10] @74 @74 text kccfhtag[11] @75 @75 text kccfhtag[12] @76 @76 text kccfhtag[13] @77 @77 text kccfhtag[14] @78 @78 text kccfhtag[15] @79 @79 text kccfhtag[16] @80 @80 text kccfhtag[17] @81 @81 text kccfhtag[18] @82 @82 text kccfhtag[19] @83 @83 text kccfhtag[20] @84 @84 text kccfhtag[21] @85 @85 text kccfhtag[22] @86 @86 text kccfhtag[23] @87 @87 text kccfhtag[24] @88 @88 text kccfhtag[25] @89 @89 text kccfhtag[26] @90 @90 text kccfhtag[27] @91 @91 text kccfhtag[28] @92 @92 text kccfhtag[29] @93 @93 text kccfhtag[30] @94 @94 text kccfhtag[31] @95 @95 ub4 kcvfhrdb @96 0x00000000 @96 0x00000000 struct kcvfhcrs, 8 bytes @100 @100 Datafile creation change# ub4 kscnbas @100 0x003011e2 @100 0x003012ed creation at SCN base在file$.crscnbas,控制文件中也是有的。 ub2 kscnwrp @104 0x0000 @104 0x0000 creation at SCN wrap 在file$.crscnwrp,控制文件也是有的 ub4 kcvfhcrt @108 0x32a8ae70 @108 0x32a8ae81 Datafile creation timestamp,控制文件也可以得到,不过需要计算,个人认为这里不需要修改,这里我一般是通过重建控制文件 下面三行的内容,一般会用于在offline的数据文件打开后的修复会实现,这种情况下不会使用,不过需要了解一下 ub4 kcvfhrlc @112 0x328d111c @112 0x328d111c Resetlogs timestamp也可以从控制文件中得到 struct kcvfhrls, 8 bytes @116 @116 Resetlogs change# ub4 kscnbas @116 0x00220d37 @116 0x00220d37 resetlog scnbase ub2 kscnwrp @120 0x0000 @120 0x0000 reset log scn wrap ub4 kcvfhbti @124 0x00000000 @124 0x00000000 Time the backup started, struct kcvfhbsc, 8 bytes @128 @128 System change number when backup starte ub4 kscnbas @128 0x00000000 @128 0x00000000 ub2 kscnwrp @132 0x0000 @132 0x0000 ub2 kcvfhbth @136 0x0000 @136 0x0000 ub2 kcvfhsta @138 0x0004 (KCVFHOFZ) @138 0x0004 (KCVFHOFZ) (file header status),这里的值比较多,见笔记 struct kcvfhckp, 36 bytes @484 @484 checkpoint相当的信息 struct kcvcpscn, 8 bytes @484 @484 ub4 kscnbas @484 0x003011e3 @484 0x003012ee checkpoint scn base ub2 kscnwrp @488 0x0000 @488 0x0000 checkpoint scn wrap ub4 kcvcptim @492 0x32a8ae70 @492 0x32a8ae81 checkpoint time ub2 kcvcpthr @496 0x0001 @496 0x0001 checkpoint thread union u, 12 bytes @500 @500 thread RBA struct kcvcprba, 12 bytes @500 @500 ub4 kcrbaseq @500 0x0000005e @500 0x0000005e redo sequence ub4 kcrbabno @504 0x0000020b @504 0x00000291 redo block number ub2 kcrbabof @508 0x0010 @508 0x0010 Byte offset,the byte offset into the block at which the redo record starts ub1 kcvcpetb[0] @512 0x02 @512 0x02 ub1 kcvcpetb[1] @513 0x00 @513 0x00 ub1 kcvcpetb[2] @514 0x00 @514 0x00 ub1 kcvcpetb[3] @515 0x00 @515 0x00 ub1 kcvcpetb[4] @516 0x00 @516 0x00 ub1 kcvcpetb[5] @517 0x00 @517 0x00 ub1 kcvcpetb[6] @518 0x00 @518 0x00 ub1 kcvcpetb[7] @519 0x00 @519 0x00 ub4 kcvfhcpc @140 0x00000002 @140 0x00000002 checkpoint_count ub4 kcvfhrts @144 0x00000000 @144 0x00000000 recovered at timstamp ub4 kcvfhccc @148 0x00000001 @148 0x00000001 control file checkpoint count at read before write(cpc-1) struct kcvfhbcp, 36 bytes @152 @152 Backup Checkpoint SCN struct kcvcpscn, 8 bytes @152 @152 ub4 kscnbas @152 0x00000000 @152 0x00000000 ub2 kscnwrp @156 0x0000 @156 0x0000 ub4 kcvcptim @160 0x00000000 @160 0x00000000 ub2 kcvcpthr @164 0x0000 @164 0x0000 union u, 12 bytes @168 @168 struct kcvcprba, 12 bytes @168 @168 ub4 kcrbaseq @168 0x00000000 @168 0x00000000 ub4 kcrbabno @172 0x00000000 @172 0x00000000 ub2 kcrbabof @176 0x0000 @176 0x0000 ub1 kcvcpetb[0] @180 0x00 @180 0x00 ub1 kcvcpetb[1] @181 0x00 @181 0x00 ub1 kcvcpetb[2] @182 0x00 @182 0x00 ub1 kcvcpetb[3] @183 0x00 @183 0x00 ub1 kcvcpetb[4] @184 0x00 @184 0x00 ub1 kcvcpetb[5] @185 0x00 @185 0x00 ub1 kcvcpetb[6] @186 0x00 @186 0x00 ub1 kcvcpetb[7] @187 0x00 @187 0x00 ub4 kcvfhbhz @312 0x00000000 @312 0x00000000 struct kcvfhxcd, 16 bytes @316 @316 ub4 space_kcvmxcd[0] @316 0x00000000 @316 0x00000000 ub4 space_kcvmxcd[1] @320 0x00000000 @320 0x00000000 ub4 space_kcvmxcd[2] @324 0x00000000 @324 0x00000000 ub4 space_kcvmxcd[3] @328 0x00000000 @328 0x00000000 sword kcvfhtsn @332 6 @332 6 tablespace number ub2 kcvfhtln @336 0x0003 @336 0x0003 这个代表表空间名的长度 text kcvfhtnm[0] @338 H @338 H text kcvfhtnm[1] @339 T @339 T text kcvfhtnm[2] @340 Z @340 Z text kcvfhtnm[3] @341 @341 text kcvfhtnm[4] @342 @342 text kcvfhtnm[5] @343 @343 text kcvfhtnm[6] @344 @344 text kcvfhtnm[7] @345 @345 text kcvfhtnm[8] @346 @346 text kcvfhtnm[9] @347 @347 text kcvfhtnm[10] @348 @348 text kcvfhtnm[11] @349 @349 text kcvfhtnm[12] @350 @350 text kcvfhtnm[13] @351 @351 text kcvfhtnm[14] @352 @352 text kcvfhtnm[15] @353 @353 text kcvfhtnm[16] @354 @354 text kcvfhtnm[17] @355 @355 text kcvfhtnm[18] @356 @356 text kcvfhtnm[19] @357 @357 text kcvfhtnm[20] @358 @358 text kcvfhtnm[21] @359 @359 text kcvfhtnm[22] @360 @360 text kcvfhtnm[23] @361 @361 text kcvfhtnm[24] @362 @362 text kcvfhtnm[25] @363 @363 text kcvfhtnm[26] @364 @364 text kcvfhtnm[27] @365 @365 text kcvfhtnm[28] @366 @366 text kcvfhtnm[29] @367 @367 ub4 kcvfhrfn @368 0x00000006 @368 0x00000007 relative file number struct kcvfhrfs, 8 bytes @372 @372 The SCN at which the recovery of this file will be complete ub4 kscnbas @372 0x00000000 @372 0x00000000 ub2 kscnwrp @376 0x0000 @376 0x0000 ub4 kcvfhrft @380 0x00000000 @380 0x00000000 struct kcvfhafs, 8 bytes @384 @384 absolute fuzzy scn, 即Minimum PITR SCN ub4 kscnbas @384 0x00000000 @384 0x00000000 ub2 kscnwrp @388 0x0000 @388 0x0000 ub4 kcvfhbbc @392 0x00000000 @392 0x00000000 ub4 kcvfhncb @396 0x00000000 @396 0x00000000 ub4 kcvfhmcb @400 0x00000000 @400 0x00000000 ub4 kcvfhlcb @404 0x00000000 @404 0x00000000 ub4 kcvfhbcs @408 0x00000000 @408 0x00000000 ub2 kcvfhofb @412 0x0000 @412 0x0000 ub2 kcvfhnfb @414 0x0000 @414 0x0000 ub4 kcvfhprc @416 0x328d0f49 @416 0x328d0f49 prev reset logs count struct kcvfhprs, 8 bytes @420 @420 prev reset logs SCN ub4 kscnbas @420 0x00220d33 @420 0x00220d33 ub2 kscnwrp @424 0x0000 @424 0x0000 struct kcvfhprfs, 8 bytes @428 @428 ub4 kscnbas @428 0x00000000 @428 0x00000000 ub2 kscnwrp @432 0x0000 @432 0x0000 ub4 kcvfhtrt @444 0x00000000 @444 0x00000000
5,rename数据文件
数据库在mount阶段,rename数据文件
www.htz.pw > alter database rename file ‘/oracle/app/oracle/product/10.2.0/db_1024/dbs/MISSING00005’ to ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’; Database altered.
6 重建控制文件
因为bbed修改后的数据文件头的里面的时间跟控制文件中不一致,所以需要重建一次控制文件
www.htz.pw > @/tmp/control.ctl ORA-01081: cannot start already-running ORACLE – shut it down first Control file created.
记住此时的控制文件中,应该包括所有的数据文件,包括之前undotbs01.dbf的信息
7,启动数据库报ORA-01177错误
www.htz.pw > recover database using backup controlfile until cancel; ORA-00279: change 2353299 generated at 07/19/2014 22:21:38 needed for thread 1 ORA-00289: suggestion : /oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_2_ %u_.arc ORA-00280: change 2353299 for thread 1 is in sequence #2 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’ ORA-01112: media recovery not started www.htz.pw > alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection force
alert日志中出现下面的错误
but not in the controlfile. Adding to controlfile. Sat Jul 19 22:23:21 2014 Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_31222.trc: ORA-01177: data file does not match dictionary – probably old incarnation ORA-01110: data file 5: ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ Error 1177 happened during db open, shutting down database USER: terminating instance due to error 1177 Instance terminated by USER, pid = 31222 ORA-1092 signalled during: alter database open resetlogs…
这里其实file$的scn值与数据文件头中的scn已经一致,但还是报错,于是手动同时修改file$与数据文件头中的值。
见ORA-01177 probably old incarnation
8 数据库正常启动
www.htz.pw > @/tmp/control.ctl ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 2082432 bytes Variable Size 100665728 bytes Database Buffers 54525952 bytes Redo Buffers 10498048 bytes Control file created. Media recovery complete. Database altered. www.htz.pw > select open_mode from v$database; OPEN_MODE ———- READ WRITE
其实这里数据库后台会一直把数据文件中的坏块错误,如果数据文件所有的归档日志存在,可以recover datafile恢复正常,如果不存在,只能马上drop掉相应的表空间。