暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

ora-01173错误的分享

原创 wangx 2024-12-02
79

ora-01173错误的分享


背景

近期,做了一个数据恢复,由于磁盘阵列做的raid5,3块硬盘亮灯损坏,导致rac宕机,硬件商将存储拿去重新将raid5上线拉回来,起库做恢复遇到一系列的错误。

1、asm报错:ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [2], [1843], [3748], [1854], [3748], [], [], [], [], []

2、抽取数据文件报错:amdu和其它工具将文件弄出来的时候。

3、数据库层面恢复报错:部分文件是几个月之前。

**当我解决了前面2层问题后,rac能起来,但查看数据库的文件头,有5个不对,是几个月前的。结合asm的报错,怀疑是最后一块盘的离线。

恢复数据库

1、获取回滚段,起库

ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01173: data dictionary indicates missing data file from system tablespace
Process ID: 149332
Session ID: 2623 Serial number: 1

2、分析

--
oradebug setmypid
oradebug event 10046 trace name context forever,level 12;
oradebug tracefile_name
alter database open resetlogs;


#######/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14768.trc##########
WAIT #140543587171488: nam='db file sequential read' ela= 2 file#=1 block#=128 blocks=1 obj#=0 tim=1728478848746060 --读取了file 1 block 128
=====================
PARSING IN CURSOR #140543583416216 len=142 dep=1 uid=0 oct=3 lid=0 tim=1728478848746193 hv=361892850 ad='15e8f9338' sqlid='7bd391hat42zk'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #140543583416216:c=107,e=108,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=1728478848746193
BINDS #140543583416216:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7fd2da5d6b50 bln=22 avl=02 flg=05
value=1 --获取的us#=1, (0为 SYSTEM,1是第一个回滚段,在3号文件的128开始的位置,ts#=2 指向的是undo表空间的,通过其它库查询,)
EXEC #140543583416216:c=187,e=187,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=906473769,tim=1728478848746399
WAIT #140543583416216: nam='db file sequential read' ela= 5 file#=1 block#=321 blocks=1 obj#=34 tim=1728478848746414
WAIT #140543583416216: nam='db file sequential read' ela= 2 file#=1 block#=225 blocks=1 obj#=15 tim=1728478848746423
FETCH #140543583416216:c=22,e=23,p=2,cr=2,cu=0,mis=0,r=1,dep=1,og=3,plh=906473769,tim=1728478848746428
STAT #140543583416216 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=21 us)'
STAT #140543583416216 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=14 us)'
CLOSE #140543583416216:c=2,e=1,dep=1,type=0,tim=1728478848746441
=====================
PARSING IN CURSOR #140543583416216 len=288 dep=1 uid=0 oct=3 lid=0 tim=1728478848746595 hv=1456614789 ad='15e8f87a8' sqlid='57guu81bd4bc5'
select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext, dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags, plugged, NVL(spare1,0), NVL(spare2,0), affstrength from ts$ where ts#=:1
--上面这个就是去获取undo表空间的信息(ts#=2的为 UNDOTBS1,但是创建了控制文件后,又没有该undo的文件信息,所以报错)
--从上面可以得知,第一步是获取回滚段信息,然后得到使用的那些undo回滚段,再去TS#扫描undo回滚段的信息,进一步获取undo表空间的状态,如果获取不到,就出错
END OF STMT
PARSE #140543583416216:c=140,e=139,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1728478848746594
BINDS #140543583416216:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7fd2da5d6b50 bln=22 avl=02 flg=05
value=2
EXEC #140543583416216:c=272,e=272,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2397009925,tim=1728478848746891
WAIT #140543583416216: nam='db file sequential read' ela= 3 file#=1 block#=185 blocks=1 obj#=7 tim=1728478848746917
WAIT #140543583416216: nam='db file sequential read' ela= 1 file#=1 block#=179 blocks=1 obj#=16 tim=1728478848746925 --file#=1 block#=179出现的问题
FETCH #140543583416216:c=32,e=33,p=2,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=2397009925,tim=1728478848746931
STAT #140543583416216 id=1 cnt=1 pid=0 pos=1 obj=16 op='TABLE ACCESS CLUSTER TS$ (cr=2 pr=2 pw=0 time=31 us)'
STAT #140543583416216 id=2 cnt=1 pid=1 pos=1 obj=7 op='INDEX UNIQUE SCAN I_TS# (cr=1 pr=1 pw=0 time=23 us)'
CLOSE #140543583416216:c=10,e=10,dep=1,type=0,tim=1728478848746945
ORA-01173: data dictionary indicates missing data file from system tablespace
ORA-01173: data dictionary indicates missing data file from system tablespace

*** 2024-10-09 21:00:48.747
USER (ospid: 14768): terminating the instance due to error 1173


*._corrupted_rollback_segments=true
*._offline_rollback_segments=true
*._offline_rollback_segments=(_SYSSMU1_1880814008$,_SYSSMU2_237578013$,_SYSSMU3_3036648903$,_SYSSMU4_2554663710$,_SYSSMU5_2656872754$,_SYSSMU6_2560781624$,_SYSSMU7_362778077$,_SYSSMU8_3123962168$,_SYSSMU9_902607226$,_SYSSMU10_600653064$,_SYSSMU11_1908542605$,_SYSSMU12_681659328$,_SYSSMU13_2355401967$,_SYSSMU14_1898001506$,_SYSSMU15_2247781413$,_SYSSMU16_589156960$,_SYSSMU17_2366401286$,_SYSSMU18_1287317225$,_SYSSMU19_3322902554$,_SYSSMU20_2714089088$)
*._corrupted_rollback_segments=(_SYSSMU1_1880814008$,_SYSSMU2_237578013$,_SYSSMU3_3036648903$,_SYSSMU4_2554663710$,_SYSSMU5_2656872754$,_SYSSMU6_2560781624$,_SYSSMU7_362778077$,_SYSSMU8_3123962168$,_SYSSMU9_902607226$,_SYSSMU10_600653064$,_SYSSMU11_1908542605$,_SYSSMU12_681659328$,_SYSSMU13_2355401967$,_SYSSMU14_1898001506$,_SYSSMU15_2247781413$,_SYSSMU16_589156960$,_SYSSMU17_2366401286$,_SYSSMU18_1287317225$,_SYSSMU19_3322902554$,_SYSSMU20_2714089088$)


再次执行恢复动作

oradebug setmypid
oradebug event 10046 trace name context forever,level 12;
oradebug tracefile_name
alter database open resetlogs;

/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_15564.trc
SQL> alter database open resetlogs;
Database altered. --数据库已经打开

--重要的记录下:
1)更新undo段status=5
select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext, dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags, plugged, NVL(spare1,0), NVL(spare2,0), affstrength from ts$ where ts#=:1
2)ts$读取了ts#=2 - 30 ,打开数据库看了,就是排除了SYSTEM和SYSAUX开始读取的
select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1
3)file$ 读取了file#=1 -28 ,读取了所有file$中的行


3、再次模拟

上面第一次open resetlogs看到了读取到file 1 block 179就报错了,事实上就是读取的聚簇表C_TS#中的TS$,该表涉及块为177-182
select owner,segment_name,segment_type,partition_name,tablespace_name
from dba_extents
where relative_fno=1
and 179 between block_id and (block_id+blocks-1);
--C_TS#

select * from bootstrap$;


select
x.*,
ltrim(dbms_utility.make_data_block_address(x.file#,x.block#))"10进制dba"
,'0x'||ltrim(to_char(dbms_utility.make_data_block_address(x.file#,x.block#),'xxxxxxxx'))"16进制dba",

'alter system dump datafile '||x.file# ||' block '||x.block#||';' text
from (
select rowid,
dbms_rowid.rowid_object(rowid) obj#,
dbms_rowid.rowid_relative_fno(rowid) file#,
dbms_rowid.rowid_block_number(rowid) block#,
dbms_rowid.rowid_row_number(rowid) row#
from ts$) x;

--模拟坏块看是否能够正常开库聚簇表C_TS#中的TS$,该表涉及块为177-182
BBED> corrupt
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
Block marked media corrupt.

BBED> sum apply
Check value for File 1, Block 179:
current = 0x2bf7, required = 0x2bf7

BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/orcl/SYSTEM01.DBF
BLOCK = 179

Block Checking: DBA = 4194483, Block Type = KTB-managed data block
Found block already marked corrupted

DBVERIFY - Verification complete

Total Blocks Examined : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing (Index): 0
Total Blocks Empty : 0
Total Blocks Marked Corrupt : 0
Total Blocks Influx : 0
Message 531 not found; product=RDBMS; facility=BBED

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 179) --报错太明显了
ORA-01110: data file 1: '/u01/app/oracle/oradata/orcl/SYSTEM01.DBF'
Process ID: 19780
Session ID: 125 Serial number: 3




--再次模拟坏块看是否能够正常开库聚簇表C_TS#中的TS$,该表涉及块为179
BBED> d /v offset 98 count 20
File: /u01/app/oracle/oradata/orcl/SYSTEM01.DBF (1)
Block: 179 Offsets: 98 to 117 Dba:0x004000b3
-------------------------------------------------------
1e00d11e 0f1f0f1f 00000100 01000000 l ................
01000100 l ....

<16 bytes per line>

BBED> p kdbh
struct kdbh, 14 bytes @92
ub1 kdbhflag @92 0x00 (NONE)
sb1 kdbhntab @93 3
sb2 kdbhnrow @94 2
sb2 kdbhfrre @96 -1
sb2 kdbhfsbo @98 30
sb2 kdbhfseo @100 7889
sb2 kdbhavsp @102 7951
sb2 kdbhtosp @104 7951

BBED> m /x 1c offset 98
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /u01/app/oracle/oradata/orcl/SYSTEM01.DBF (1)
Block: 179 Offsets: 98 to 117 Dba:0x004000b3
------------------------------------------------------------------------
1c00d11e 0f1f0f1f 00000100 01000000 01000100

<32 bytes per line>

BBED> sum apply
Check value for File 1, Block 179:
current = 0x2bfb, required = 0x2bfb

BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/orcl/SYSTEM01.DBF
BLOCK = 179

Block Checking: DBA = 4194483, Block Type = KTB-managed data block
data header at 0x7f1e6eeaa25c
kdbchk: fsbo(28) wrong, (hsz 30)
Block 179 failed with check code 6129

DBVERIFY - Verification complete

Total Blocks Examined : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing (Index): 0
Total Blocks Empty : 0
Total Blocks Marked Corrupt : 0
Total Blocks Influx : 0
Message 531 not found; product=RDBMS; facility=BBED

BBED> p kdbh
struct kdbh, 14 bytes @92
ub1 kdbhflag @92 0x00 (NONE)
sb1 kdbhntab @93 3
sb2 kdbhnrow @94 2
sb2 kdbhfrre @96 -1
sb2 kdbhfsbo @98 28 --这里改为了28

oradebug setmypid
Statement processed.
oradebug event 10046 trace name context forever,level 12;
Statement processed.
oradebug tracefile_name
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_19860.trc
SQL> alter database open resetlogs;

Database altered.

--再次修改了kdbh部分,还是能开库;


总结

1、ORA-01173 和 undo回滚段有关,出现该错误,是因为bootstrap过程中需要加载需要的回滚段,然后获得引导。当频闭的回滚段内容不全或者缺失就会遇到该问题,解决办法也很简单,将完成的回滚段获取了写入参数文件中即可。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论