问题描述
下面的案例来至于一位兄弟的环境,10G RAC,CRS被意外删除,数据库被crash了。通过将RAC更改为单机直接启动,报下面的错误
Errors in file /oracle/10.2/admin/gzora/udump/gzora_ora_8355.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 3 with name "_SYSSMU3$" too small Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 8355 ORA-1092 signalled during: alter database open…
专家解答
1,配置10046、errorstack
oradebug setmypid oradebug event 10046 trace name context forever,level 12; oradebug event 1555 trace name errorstack level 2; recover database using backup controlfile until cancel; alter database open resetlogs;
查看报错的详细信息
1.1 查看block信息
d:\>grep -E "^Block he|^0x0" gzora_ora_8897.trc Block header dump: 0x0040006a 0x01 0x0000.046.00000029 0x00400011.0037.1d –U- 1 fsc 0x0000.02a1d124 这里发现ITL 01的状态是U,并且锁定1行的记录 查看块的dump信息,以0x0040006a就可以搜索到 BH (0x160f96ad8) file#: 1 rdba: 0x0040006a (1/106) class: 1 ba: 0x1603d8000 set: 10 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0 dbwrid: 0 obj: 15 objn: 15 tsn: 0 afn: 1 hash: [187c9e220,187c9e220] lru: [160f96c68,1868cb9f8] ckptq: [NULL] fileq: [NULL] objq: [183facdb0,183facdb0] st: XCURRENT md: NULL tch: 1 flags: LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] buffer tsn: 0 rdba: 0x0040006a (1/106) scn: 0x0000.02a1d124 seq: 0x01 flg: 0x06 tail: 0xd1240601 frmt: 0x02 chkval: 0x21bd type: 0x06=trans data Block header dump: 0x0040006a Object id on Block? Y seg/obj: 0xf csc: 0x00.2a1d123 itc: 1 flg: O typ: 1 – DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0000.046.00000029 0x00400011.0037.1d –U- 1 fsc 0x0000.02a1d124 这里可以看到块的对象,csc,块号等信息 下面查看一下锁定行的记录 tab 0, row 10, @0x143d tl: 60 fb: –H-FL– lb: 0x1 cc: 17 col 0: [ 2] c1 0b col 1: [10] 5f 53 59 53 53 4d 55 31 30 24 col 2: [ 2] c1 02 col 3: [ 2] c1 03 col 4: [ 3] c2 02 36 col 5: [ 5] c4 2d 10 5d 19 col 6: [ 1] 80 col 7: [ 4] c3 03 09 0c col 8: [ 4] c3 02 1a 04 col 9: [ 1] 80 col 10: [ 2] c1 04 col 11: [ 2] c1 02 col 12: *NULL* col 13: *NULL* col 14: *NULL* col 15: *NULL* col 16: [ 2] c1 02 tab 0, row 11, @0x1831
1.2 查看SQL与会话信息
查看当前的信息信息,以CURRENT SESSION搜索 =================================================== CURRENT SESSION’S INSTANTIATION STATE ————————————- current session=0x1888972b8 KGI STATE DUMP for session=0x1888972b8 KGI STATE DUMP DONE for session=0x1888972b8 ******************** Session Cursor Dump ********************** Current cursor: 5, pgadep: 1 Open cursors(pls, sys, hwm, max): 5(0, 4, 64, 300) NULL 0 SYNTAX 0 PARSE 3 BOUND 1 FETCH 1 ROW 0 Cached frame pages(total, free): 4k(4, 2), 8k(0, 0), 16k(0, 0), 32k(0, 0) pgactx: 0x185fffd20 ctxcbk: (nil) ctxqbc: (nil) ctxrws: 0x185ffd940 Explain plan: Dumping stream from 0 to 78 ———————————- 0000: 143 216 224 255 133 1 0 0 0 143 119 1 0 38 40 0 0 3 1 2 ……….w..&(….. 0020: 18 3 2 14 0 0 0 54 7 3 5 12 8 106 108 109 224 152 228 164 …….6…..jlm…. 0040: 107 111 3 52 3 3 2 3 1 2 4 34 0 0 143 120 229 255 133 1 ko.4……."…x…. 0060: 0 0 0 134 119 2 23 13 0 0 3 1 5 36 3 2 1 142 ….w……..$…. ============ Plan Table ============ ———————————————–+———————————–+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ———————————————–+———————————–+ | 0 | SELECT STATEMENT | | | | | | | 1 | TABLE ACCESS BY INDEX ROWID | OBJ$ | | | | | | 2 | INDEX UNIQUE SCAN | I_OBJ1 | | | | | ———————————————–+———————————–+ Content of other_xml column =========================== db_version : 10.2.0.4 parse_schema : SYS plan_hash : 3317232865 Outline Data: /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE(‘10.2.0.4’) RBO_OUTLINE OUTLINE_LEAF(@"SEL$1") INDEX_RS_ASC(@"SEL$1" "OBJ$"@"SEL$1" ("OBJ$"."OBJ#")) END_OUTLINE_DATA */
以PARSING IN CURSOR #5来搜索
以Cursor#5来搜索
—————————————- Cursor#5(0x2aaaac891988) state=FETCH curiob=0x2aaaac8a8408 curflg=7 fl2=0 par=0x2aaaac8917e8 ses=0x1888972b8 sqltxt(0x186f42730)=select ctime, mtime, stime from obj$ where obj# = :1这里可以看到SQL的内容 hash=fa0bd3f60d6ee4f2495f9af8199b75b9 parent=0x186f42378 maxchild=01 plk=0x183f7a478 ppn=n cursor instantiation=0x2aaaac8a8408 used=1411388199 child#0(0x186f41800) pcs=0x186f41f88 clk=0x183f7ae88 ci=0x186f41448 pn=0x1823bd088 ctx=0x185fffd20 kgsccflg=0 llk[0x2aaaac8a8410,0x2aaaac8a8410] idx=0 xscflg=e01414f6 fl2=45000401 fl3=4022210c fl4=100 Bind bytecodes Opcode = 1 Unoptimized Offsi = 48, Offsi = 0 kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2aaaac8a7fc0 bln=22 avl=02 flg=05 value=20这里可以看到绑定变量的值 Frames pfr 0x2aaaac8a8388 siz=2920 efr 0x2aaaac8a80a8 siz=2720 Cursor frame dump enxt: 4.0×00000010 enxt: 3.0x000000c0 enxt: 2.0×00000020 enxt: 1.0x000009c0 pnxt: 2.0×00000008 pnxt: 1.0x000000c0 kxscphp 0x2aaaac89a1a8 siz=1992 inu=1160 nps=1112 kxscdfhp 0x2aaaac8a0808 siz=984 inu=88 nps=0 kxscbhp 0x2aaaac8a0628 siz=984 inu=168 nps=48 kxscwhp 0x2aaaac8a0718 siz=4056 inu=360 nps=0
其实到这里我们还没有用到10046的数据,下面再看看10046的trace文件
===================== PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1378308789118580 hv=429618617 ad=’86f42730′ select ctime, mtime, stime from obj$ where obj# = :1 END OF STMT PARSE #5:c=0,e=282,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1378308789118577 BINDS #5: kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2aaaac8a7fc0 bln=22 avl=02 flg=05 value=20 EXEC #5:c=999,e=477,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1378308789119125 WAIT #5: nam=’db file sequential read’ ela= 1826 file#=1 block#=218 blocks=1 obj#=-1 tim=1378308789121036 WAIT #5: nam=’db file sequential read’ ela= 3930 file#=1 block#=219 blocks=1 obj#=-1 tim=1378308789125031 WAIT #5: nam=’db file sequential read’ ela= 4146 file#=1 block#=122 blocks=1 obj#=-1 tim=1378308789129225 ===================== PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1378308789129622 hv=361892850 ad=’85ffc6d8′ select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1 END OF STMT PARSE #6:c=1000,e=339,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=1378308789129620 BINDS #6: kkscoacd Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2aaaacb2dbd0 bln=22 avl=02 flg=05 value=8 EXEC #6:c=0,e=505,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=1378308789130223 WAIT #6: nam=’db file sequential read’ ela= 3262 file#=1 block#=202 blocks=1 obj#=-1 tim=1378308789133526 WAIT #6: nam=’db file sequential read’ ela= 1110 file#=1 block#=106 blocks=1 obj#=-1 tim=1378308789134677 FETCH #6:c=0,e=4472,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=1378308789134719 STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op=’TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=4466 us)’ STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op=’INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=3304 us)’ WAIT #5: nam=’db file sequential read’ ela= 2661 file#=2 block#=1152 blocks=1 obj#=-1 tim=1378308789137472 这里可以看到在出故障的SQL下面还有很多SQL语法正常执行了,所以10046最后一条语句并不是报错的SQL语句
2 故障原因
通过上面的trace信息,大概知道原因,在访问obj$表的时,需要访问undo$表,undo$表中发现一条itl为U,在做延迟块清处过程中,在UNDO段中没有找到此ITL的信息,触发ORA-01555的报错。
3,故障处理
SQL> select checkpoint_change# from v$datafile_header; CHECKPOINT_CHANGE# —————— 44279281 44279281 44279281 44279281 44279281 SQL> recover database using backup controlfile until cancel; ORA-00279: change 44279281 generated at 09/22/2014 20:27:54 needed for thread 1 ORA-00289: suggestion : +ORADATA ORA-00280: change 44279281 for thread 1 is in sequence #1 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: ‘+ORADATA/gzora/datafile/system.258.757644911’ ORA-01112: media recovery not started 这里手动增加SCN的值就可以。 SQL> oradebug setmypid Statement processed. SQL> oradebug dumpvar sga kcsgscn_ kcslf kcsgscn_ [0600122B0, 0600122E0) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 60011F90 00000000 SQL> select to_char(54279281,’xxxxxxxxxxxxxxxx’) from dual; TO_CHAR(54279281, —————– 33c3c71 SQL> oradebug poke 0x0600122B0 4 0x33c3c71; BEFORE: [0600122B0, 0600122B4) = 00000000 AFTER: [0600122B0, 0600122B4) = 033C3C71 SQL> alter database open resetlogs; Database altered. SQL> create undo tablespace undotbs2 datafile ‘+oradata’ size 10g; Tablespace created. 切换到新的UNDO表空间
远程的工作完成,剩下的一些收尾的工作由现成工程师完成。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。