暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle异常恢复实战第5讲 - Oracle的open过程_笔记

四九年入国军 2024-09-09
142
  感兴趣的还是记得去看原帖子,笔记有删减,原作者微信公众号:
在这节课中,我们仍然通过各种trace手段来跟踪相关进程,然后通过实验过程来深入了解open的过程.;首先我们做一些准备工作。启动几个不同的Session窗口。 --Session 1 SQL> startup mount ORACLE instance started. Total System Global Area 2505338880 bytes Fixed Size 2255832 bytes Variable Size 620758056 bytes Database Buffers 1862270976 bytes Redo Buffers 20054016 bytes Database mounted. SQL> !ps -ef|grep -v grep|grep ora_ oracle 75874 1 0 17:01 ? 00:00:00 ora_pmon_orcl oracle 75876 1 0 17:01 ? 00:00:00 ora_psp0_orcl oracle 75879 1 0 17:01 ? 00:00:00 ora_vktm_orcl oracle 75883 1 0 17:01 ? 00:00:00 ora_gen0_orcl oracle 75885 1 0 17:01 ? 00:00:00 ora_diag_orcl oracle 75887 1 0 17:01 ? 00:00:00 ora_dbrm_orcl oracle 75889 1 0 17:01 ? 00:00:00 ora_dia0_orcl oracle 75891 1 1 17:01 ? 00:00:00 ora_mman_orcl oracle 75893 1 0 17:01 ? 00:00:00 ora_dbw0_orcl oracle 75895 1 0 17:01 ? 00:00:00 ora_lgwr_orcl oracle 75897 1 0 17:01 ? 00:00:00 ora_ckpt_orcl oracle 75899 1 0 17:01 ? 00:00:00 ora_smon_orcl oracle 75901 1 0 17:01 ? 00:00:00 ora_reco_orcl oracle 75903 1 0 17:01 ? 00:00:00 ora_mmon_orcl oracle 75905 1 0 17:01 ? 00:00:00 ora_mmnl_orcl oracle 75907 1 0 17:01 ? 00:00:00 ora_d000_orcl oracle 75909 1 0 17:01 ? 00:00:00 ora_s000_orcl --session 2:监控smon进程 strace -fr -o /tmp/75899.log -p 75899 --session 3:监控dbwr进程 strace -fr -o /tmp/75893.log -p 75893 --Session 4 SQL> oradebug setmypid Statement processed. SQL> oradebug event 10046 trace name context forever,level 12; Statement processed. SQL> alter database open; Database altered. SQL> oradebug close_trace Statement processed. SQL> oradebug tracefile_name /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_75927.trc SQL> 大家可以看到,我们在open数据库之前打开了数据库10046 trace跟踪,同时也进行了strace跟踪;其中strace跟踪我们重点观察smon和dbw进程。为什么要跟踪smon进程? 因为大家都知道数据库open的时候,smon进程要进行实例恢复,它需要做很多事儿。当然单纯的strace其实并不能完全理解smon进程的作用,这里我们不多讲,后面会讲其他的event来跟踪。 --smon trc:/tmp/75899.log cd /proc/75899/fd ls -l lrwx------ 1 oracle oinstall 64 9 9 17:03 256 -> /oradata/orcl/system01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:03 257 -> /oradata/orcl/sysaux01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:03 258 -> /oradata/orcl/undotbs01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:03 259 -> /oradata/orcl/users01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:03 260 -> /oradata/orcl/test01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:03 261 -> /oradata/orcl/temp01.dbf cat /tmp/75899.log | grep pread cat /tmp/75899.log | grep pread 75899 0.000240 pread64(261, "\36\242\0\0\3\0@\0\305y\16\0\0\0\0\f\253\277\0\0\1\0\0\0\200\0\0\0\1\0\0\0"..., 8192, 24576) = 8192 75899 0.000156 pread64(256, "\20\242\0\0\300\0@\0\"\247\16\0\0\0\1\4\351\373\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 1572864) = 8192 75899 0.000285 pread64(258, "\2\242\0\0\363 \300\0\373\353\22\0\0\0\1\4\316\302\0\0\1\0\34\0\255\2\0\0\246\0\"\""..., 8192, 69099520) = 8192 75899 0.000476 pread64(257, "\35\242\0\0\2\0\200\0\366\352\22\0\0\0\2\4\23A\0\0\2\0\0\0\10\0\0\0\0\t\1\0"..., 8192, 16384) = 8192 从smon进程的strace 来看,正常情况下,先后读取了临时表空间、系统表空间、undo表空间、sysaux表空间。接下来看下dbw进程的strace log: --dbw0 trc lrwx------ 1 oracle oinstall 64 9 9 17:07 263 -> /oradata/orcl/temp01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 262 -> /oradata/orcl/test01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 261 -> /oradata/orcl/users01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 260 -> /oradata/orcl/undotbs01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 259 -> /oradata/orcl/sysaux01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 258 -> /oradata/orcl/system01.dbf lrwx------ 1 oracle oinstall 64 9 9 17:07 257 -> /oradata/orcl/control02.ctl lrwx------ 1 oracle oinstall 64 9 9 17:07 256 -> /oradata/orcl/control01.ctl cat /tmp/75893.log | grep read 75893 0.444386 pread64(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4/\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384 75893 0.000137 pread64(257, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4/\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384 75893 0.000104 pread64(256, "\25\302\0\0\17\0\0\0\270\5\0\0\377\377\1\4\345\355\0\0\0\2\0\0\0\0\0\0\0\0\0\301"..., 16384, 245760) = 16384 75893 0.000116 pread64(258, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\210\0\0\0 \0\0\0r\1\0}|{z"..., 8192, 0) = 8192 75893 0.000106 pread64(258, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\210\0\0\0 \0\0\0r\1\0}|{z"..., 8192, 0) = 8192 75893 0.000097 pread64(259, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\363\0\0\0 \0\0\0\t\1\0}|{z"..., 8192, 0) = 8192 75893 0.000110 pread64(259, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\363\0\0\0 \0\0\0\t\1\0}|{z"..., 8192, 0) = 8192 75893 0.000105 pread64(260, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\331\0\0\0 \0\0\0#\0\0}|{z"..., 8192, 0) = 8192 75893 0.000103 pread64(260, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\331\0\0\0 \0\0\0#\0\0}|{z"..., 8192, 0) = 8192 75893 0.000102 pread64(261, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\370\0\0\0 \0\0\200\2\0\0}|{z"..., 8192, 0) = 8192 75893 0.000101 pread64(261, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\370\0\0\0 \0\0\200\2\0\0}|{z"..., 8192, 0) = 8192 75893 0.000102 pread64(262, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\310\0\0\0 \0\0\0002\0\0}|{z"..., 8192, 0) = 8192 75893 0.000108 pread64(262, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\310\0\0\0 \0\0\0002\0\0}|{z"..., 8192, 0) = 8192 75893 0.000115 pread64(256, "\25\302\0\0\265\0\0\0z\2\0\0\377\377\1\4}\227\0\0\0\n\0\0\0 \0\0\36\0\10\0"..., 16384, 2965504) = 16384 75893 0.000103 pread64(263, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\364\0\0\0 \0\0\200\16\0\0}|{z"..., 8192, 0) = 8192 75893 0.000101 pread64(263, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\364\0\0\0 \0\0\200\16\0\0}|{z"..., 8192, 0) = 8192 75893 0.002218 pread64(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4-\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384 75893 0.000172 pread64(257, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4-\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384 75893 0.000110 pread64(256, "\25\302\0\0\17\0\0\0\272\5\0\0\377\377\1\4\345\345\0\0\0\2\0\0\0\0\0\0\0\0\0\301"..., 16384, 245760) = 16384 75893 0.000117 pread64(256, "\25\302\0\0\21\0\0\0\272\5\0\0\377\377\1\4\375\246\0\0\0\0\0\0\0\0\0\0\300\10;F"..., 16384, 278528) = 16384 75893 0.000116 pread64(256, "\25\302\0\0\27\0\0\0\261\5\0\0\377\377\1\4\234]\0\0\0\0\0\0\0 \0\0\16\0\0\0"..., 16384, 376832) = 16384 75893 0.000155 pread64(258, "\v\242\0\0\1\0@\0\0\0\0\0\0\0\1\4\323[\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 75893 0.000254 pread64(259, "\v\242\0\0\1\0\200\0\0\0\0\0\0\0\1\4\24e\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 75893 0.000213 pread64(260, "\v\242\0\0\1\0\300\0\0\0\0\0\0\0\1\4\264k\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 75893 0.000140 pread64(261, "\v\242\0\0\1\0\0\1\0\0\0\0\0\0\1\4Q\t\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 75893 0.000121 pread64(262, "\v\242\0\0\1\0@\1\0\0\0\0\0\0\1\4C\313\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 75893 0.000185 pread64(256, "\25\302\0\0\265\0\0\0z\2\0\0\377\377\1\4}\227\0\0\0\n\0\0\0 \0\0\36\0\10\0"..., 16384, 2965504) = 16384 75893 0.000341 pread64(263, "\v\242\0\0\1\0@\0\0\0\0\0\0\0\1\0041\332\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192 从上面strace跟踪smon和dbwr进程,我们可以得到什么结论呢 1 在open过程中smon进程会读取temp、system、undo、sysaux表空间 2)  在open过程中dbwr进程会读取controlfile和数据文件 接下来我们看看更为重要的10046 trace log,需要注意的是,我们在进行数据恢复过程中,经常使用10046 trace,因此大家务必要熟练使用10046 event。 WAIT #46912550775824: nam='rdbms ipc reply' ela= 1020831 from_process=11 timeout=1800 p3=0 obj#=-1 tim=1725872581816334 WAIT #46912550775824: nam='control file sequential read' ela= 7 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816510 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816522 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816529 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816536 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816547 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816553 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816558 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816564 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816576 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816586 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816596 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816606 WAIT #46912550775824: nam='log file sync' ela= 190 buffer#=18888 sync scn=1240070 p3=0 obj#=-1 tim=1725872581816885 WAIT #46912550775824: nam='control file sequential read' ela= 4 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816921 WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816941 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816947 WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816954 WAIT #46912550775824: nam='instance state change' ela= 430 layer=2 value=1 waited=1 obj#=-1 tim=1725872581817413 WAIT #46912550775824: nam='db file sequential read' ela= 16 file#=1 block#=520 blocks=1 obj#=-1 tim=1725872581817649 从10046 trace,我们可以看到在读取controlfile的头部block,然后再读取其他的block。在倒数第7行,大家注意,有个sync scn=1240070,说明在open的时候会去比较scn。 另外在最后一行,不知道大家是否注意到file 1 block 520  ?这个其实是比较特殊的。 select owner,segment_name,HEADER_FILE,HEADER_BLOCK from dba_Segments where segment_name='BOOTSTRAP$'; OWNER SEGMENT_NAME HEADER_FILE HEADER_BLOCK -------------------- -------------------- ----------- ------------ SYS BOOTSTRAP$ 1 520 是的。file 1 block 520. 这个就是我们的 BOOTSTRAP$. 看名字,我相信大家都知道,这个是很关键,很特殊的一个对象。 也就是说数据库在open的时候,首先需要初始化数据字典,那么这些数据字典信息从哪里来呢?是的。就是从boostrap$中读取并创建。 从后面的trace内容,我们也可以发现,会接着创建bootstrap$对象,如下: PARSING IN CURSOR #46912550767600 len=188 dep=1 uid=0 oct=1 lid=0 tim=1725872581818241 hv=4006182593 ad='f4871170' sqlid='32r4f1brckzq1' create table bootstrap$ ( END OF STMT PARSE #46912550767600:c=493,e=493,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1725872581818240 EXEC #46912550767600:c=137,e=138,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=1725872581818419 CLOSE #46912550767600:c=1,e=2,dep=1,type=0,tim=1725872581818455 ===================== PARSING IN CURSOR #46912550767600 len=55 dep=1 uid=0 oct=3 lid=0 tim=1725872581818728 hv=2111436465 ad='f486f7d8' sqlid='6apq2rjyxmxpj' select line#, sql_text from bootstrap$ where obj# != :1 END OF STMT PARSE #46912550767600:c=267,e=267,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1725872581818727 BINDS #46912550767600: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2aaaadec87a8 bln=22 avl=02 flg=05 value=59 EXEC #46912550767600:c=551,e=550,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=1725872581819318 WAIT #46912550767600: nam='db file sequential read' ela= 15 file#=1 block#=520 blocks=1 obj#=59 tim=1725872581819418 从上面的 value=59,我们可以知道bootstrap$核心对象个数是59个。也就是后面会创建59个bootstrap 核心对象 到最后,我们似乎仍然没有通过strace和10046 得出oracle在open的过程中,如何去对比scn以及checkpoint cnt值的。这里我们换个思路,使用bbed来改一改,观察一下。 下面我们仍然通过实验来进行探索:    首先我们需要知道,checkpoint count存在什么地方(后面我们会讲数据文件头结构): BBED> set dba 2,1 DBA 0x00800001 (8388609 2,1) BBED> p kcvfhcpc ub4 kcvfhcpc @140 0x000000a4 --表示data checkpoint count BBED> p kcvfhccc ub4 kcvfhccc @148 0x000000a3 --表示controlfile checkpoint count BBED> 这里大家记住这2个偏移量140和148. 我们来修改datafile checkpoint count值,观察一下Oracle在open时,会是什么情况? --##场景1 模拟datafile checkpoint count值不一致 SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 2505338880 bytes Fixed Size 2255832 bytes Variable Size 620758056 bytes Database Buffers 1862270976 bytes Redo Buffers 20054016 bytes Database mounted. SQL> oradebug setmypid Statement processed. SQL> alter session set events 'immediate trace name controlf level 3'; Session altered. SQL> oradebug tracefile_name /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_80055.trc SQL> --控制文件 数据文件trc DATA FILE #2: name #6: /oradata/orcl/sysaux01.dbf creation size=0 block size=8192 status=0xe head=6 tail=6 dup=1 tablespace 1, index=2 krfil=2 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:165 scn: 0x0000.0012f0ed 09/09/2024 17:34:11 Stop scn: 0x0000.0012f0ed 09/09/2024 17:34:11 Creation Checkpointed at scn: 0x0000.0000072a 08/24/2013 11:37:37 thread:0 rba:(0x0.0.0) 同时,我们再dump 下datafile header信息: oradebug setmypid alter session set events 'immediate trace name file_hdrs level 3'; oradebug tracefile_name /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_80469.trc --数据文件头trc: DATA FILE #2: name #6: /oradata/orcl/sysaux01.dbf creation size=0 block size=8192 status=0xe head=6 tail=6 dup=1 tablespace 1, index=2 krfil=2 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:165 scn: 0x0000.0012f0ed 09/09/2024 17:34:11 Stop scn: 0x0000.0012f0ed 09/09/2024 17:34:11 Creation Checkpointed at scn: 0x0000.0000072a 08/24/2013 11:37:37 thread:0 rba:(0x0.0.0) 我们可以看到,controlfile中datafile 2的checkpoint cnt值和datafile header中的值是一样的,均为165.   现在我们手工修改下datafile header中的checkpoint cnt值,然后来观察下数据库open会是什么情况? BBED> set file 2 block 1 FILE# 2 BLOCK# 1 BBED> p kcvfhcpc ub4 kcvfhcpc @140 0x000000a5 --a5 10进制是165 BBED> d /v count 64 File: /oradata/orcl/sysaux01.dbf (2) Block: 1 Offsets: 140 to 203 Dba:0x00800001 ------------------------------------------------------- a5000000 a5354446 a4000000 00000000 l .....5DF........ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................ 00000000 00000000 00000000 00000000 l ................ <16 bytes per line> BBED> modify /x a4 offset 140 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y File: /oradata/orcl/sysaux01.dbf (2) Block: 1 Offsets: 140 to 203 Dba:0x00800001 ------------------------------------------------------------------------ a4000000 a5354446 a4000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <32 bytes per line> BBED> sum apply Check value for File 2, Block 1: current = 0x7334, required = 0x7334 BBED> 我们这里修改了offset 140,将a5改成了a4,也就是说把该数据文件的checkpoint count值改小了1,然后来启动数据库看看情况。 SQL> alter database open; alter database open * ERROR at line 1: ORA-01113: file 2 needs media recovery ORA-01110: data file 2: '/oradata/orcl/sysaux01.dbf' 我们可以看到,此时启动数据库,Oracle会提示这个数据文件需要进行介质恢复 如果是datafile checkpoint cnt值大于controlfile中的datafile checkpoint cnt值呢? BBED> modify /x af dba 2,1 offset 140 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y File: /oradata/orcl/sysaux01.dbf (2) Block: 1 Offsets: 140 to 651 Dba:0x00800001 ------------------------------------------------------------------------ af000000 a5354446 a4000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 01000000 06005359 53415558 00000000 00000000 00000000 00000000 00000000 00000000 02000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7ac92131 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 edf01200 00000000 13ef4946 01000000 1c000000 ba3a0000 100051f5 02000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 <32 bytes per line> BBED> sum apply Check value for File 2, Block 1: current = 0x733f, required = 0x733f BBED> 我们将offset 140的a4改成af,也就是改大了一些。来看看数据库open的情况。 SQL> alter database open; Database altered. SQL> 从上面的实验,我们可以得出一个结论:    1、在open的时候,oracle会去比较controlfile中的datafile checkpoint count值与datafile header 中的checkpoint count值进行比较。    2、当datafile header中的checkpoint count值小于controlfile 中的datafile checkpoint count值时,会提示该文件需要进行恢复。  3、当datafile header中的checkpoint count值大于等于controlfile 中的datafile checkpoint count值时,可以正常open数据库。 下面我们继续来实验,通过修改scn来观察open的时候,oracle对于scn的校验机制! --##场景2  datafile checkpoint scn(start scn)不一致 controlfile中stop scn  --先启动库到mount状态 BBED> set dba 1,1 DBA 0x00400001 (4194305 1,1) BBED> p kcvfhckp struct kcvfhckp, 36 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x0012f32d ub2 kscnwrp @488 0x0000 ub4 kcvcptim @492 0x4649f47d ub2 kcvcpthr @496 0x0001 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x0000001c ub4 kcrbabno @504 0x00003c53 ub2 kcrbabof @508 0x0010 ub1 kcvcpetb[0] @512 0x02 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00 ub1 kcvcpetb[3] @515 0x00 ub1 kcvcpetb[4] @516 0x00 ub1 kcvcpetb[5] @517 0x00 ub1 kcvcpetb[6] @518 0x00 ub1 kcvcpetb[7] @519 0x00 BBED> set dba 2,1 DBA 0x00800001 (8388609 2,1) BBED> p kcvfhckp struct kcvfhckp, 36 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x0012f32d ub2 kscnwrp @488 0x0000 ub4 kcvcptim @492 0x4649f47d ub2 kcvcpthr @496 0x0001 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x0000001c ub4 kcrbabno @504 0x00003c53 ub2 kcrbabof @508 0x0010 ub1 kcvcpetb[0] @512 0x02 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00 ub1 kcvcpetb[3] @515 0x00 ub1 kcvcpetb[4] @516 0x00 ub1 kcvcpetb[5] @517 0x00 ub1 kcvcpetb[6] @518 0x00 ub1 kcvcpetb[7] @519 0x00 BBED> 接下来此时我们做个控制文件的dump: SQL> oradebug setmypid Statement processed. SQL> alter session set events 'immediate trace name controlf level 3'; Session altered. SQL> oradebug close_Trace Statement processed. SQL> oradebug tracefile_name /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_82846.trc SQL> 此时控制文件dump的内容,我们选择其中一段,看下system数据文件 的情况: DATA FILE RECORDS *************************************************************************** (size = 520, compat size = 520, section max = 100, section in-use = 5, last-recid= 27, old-recno = 0, last-recno = 0) (extent = 1, blkno = 11, numrecs = 100) DATA FILE #1: name #7: /oradata/orcl/system01.dbf creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1 tablespace 0, index=1 krfil=1 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:167 scn: 0x0000.0012f32d 09/09/2024 17:57:17 Stop scn: 0x0000.0012f32d 09/09/2024 17:57:17 Creation Checkpointed at scn: 0x0000.00000007 08/24/2013 11:37:33 thread:0 rba:(0x0.0.0) enabled threads: 00000000 00000000 00000000 00000000 00000000 00000000 --可以看到此时datafile checkpoint scn(start scn)等于controlfile中的stop scn的。 首先模拟datafile checkpoint scn比控制文件stop scn要小的情况: BBED> d /v dba 2,1 offset 484 count 10 File: /oradata/orcl/sysaux01.dbf (2) Block: 1 Offsets: 484 to 493 Dba:0x00800001 ------------------------------------------------------- 2df31200 00000000 7df4 l -.......}. --0x0012f32d <16 bytes per line> BBED> modify /x 2c dba 2,1 offset 484 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y File: /oradata/orcl/sysaux01.dbf (2) Block: 1 Offsets: 484 to 995 Dba:0x00800001 ------------------------------------------------------------------------ 2cf31200 00000000 7df44946 01000000 1c000000 533c0000 100051f5 02000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 00000000 02008000 f21e0e00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <32 bytes per line> BBED> sum apply Check value for File 2, Block 1: current = 0x6d4f, required = 0x6d4f BBED> set dba 2,1 DBA 0x00800001 (8388609 2,1) BBED> p kcvfhckp struct kcvfhckp, 36 bytes @484 struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0x0012f32c --原来--0x0012f32d ub2 kscnwrp @488 0x0000 ub4 kcvcptim @492 0x4649f47d ub2 kcvcpthr @496 0x0001 union u, 12 bytes @500 struct kcvcprba, 12 bytes @500 ub4 kcrbaseq @500 0x0000001c ub4 kcrbabno @504 0x00003c53 ub2 kcrbabof @508 0x0010 ub1 kcvcpetb[0] @512 0x02 ub1 kcvcpetb[1] @513 0x00 ub1 kcvcpetb[2] @514 0x00 ub1 kcvcpetb[3] @515 0x00 ub1 kcvcpetb[4] @516 0x00 ub1 kcvcpetb[5] @517 0x00 ub1 kcvcpetb[6] @518 0x00 ub1 kcvcpetb[7] @519 0x00 这里我们认为将数据文件的检查点scn改小了一点点(2d改成了2c),重试open库 SQL> alter database open; alter database open * ERROR at line 1: ORA-00600: internal error code, arguments: [3619], [1], [0], [], [], [], [], [], [], [], [], [] SQL> SQL> select file#,checkpoint_change# from v$datafile order by 1; FILE# CHECKPOINT_CHANGE# ---------- ------------------ 1 1241901 2 1241901 3 1241901 4 1241901 5 1241901 SQL> select file#,checkpoint_change# from v$datafile_header order by 1; FILE# CHECKPOINT_CHANGE# ---------- ------------------ 1 1241901 2 1241900 3 1241901 4 1241901 5 1241901 SQL> 可以看到此时直接启动数据库会报错,提示需要进行恢复。 SQL> recover database; Media recovery complete. SQL> select file#,checkpoint_change# from v$datafile_header order by 1; FILE# CHECKPOINT_CHANGE# ---------- ------------------ 1 1241901 2 1241901 3 1241901 4 1241901 5 1241901 SQL> select file#,checkpoint_change# from v$datafile order by 1; FILE# CHECKPOINT_CHANGE# ---------- ------------------ 1 1241901 2 1241901 3 1241901 4 1241901 5 1241901 SQL> alter database open; Database altered. --手工进行recover database之后,我们会发现文件头的scn增加了,跟其他文件一致了,当然这个时候肯定是可以正常open的了。 那么如果是datafile checkpoint scn(start scn)大于 controlfile中stop scn的情况呢      我这里测试是可以直接打开数据库的,没有任何影响。到最后我们可以简单的总结下数据库open的时候的过程: 1) 首先比较controlfile中的datafile checkpoint cnt和datafile header中的checkpoint cnt值是否相等。    当datafile header中的checkpoint count值小于controlfile 中的datafile checkpoint count值时,会提示该文件需要进行恢复。  当datafile header中的checkpoint count值大于controlfile 中的datafile checkpoint count值时,可以正常open数据库。 2)当checkpoint cnt值相等之后,会比较datafile header中的start scn(checkpoint scn)与controlfile中的datafile checkpoint scn,以及database(system) checkpoint scn是否一致。 当datafile header中的checkpoint scn(start scn)小于system scn或controlfile中的对应的datafile checkpoint scn时,会提示需要进行介质恢复。 当datafile header中的checkpoint scn(start scn)大于system scn或controlfile中的对应的datafile checkpoint scn时,可以正常open数据库。    3)当controlfile中datafile的stop scn为空时,会提示需要进行实例恢复,如下: --session 1 SQL> drop table scott.t1; Table dropped. SQL> shutdown abort ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 2505338880 bytes Fixed Size 2255832 bytes Variable Size 620758056 bytes Database Buffers 1862270976 bytes Redo Buffers 20054016 bytes Database mounted. SQL> oradebug setmypid Statement processed. SQL> alter session set events 'immediate trace name controlf level 3'; Session altered. SQL> oradebug close_trace Statement processed. SQL> oradebug tracefile_name /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_86043.trc SQL> 此时的trace文件内容如下: *************************************************************************** DATABASE ENTRY *************************************************************************** (size = 316, compat size = 316, section max = 1, section in-use = 1, last-recid= 0, old-recno = 0, last-recno = 0) (extent = 1, blkno = 1, numrecs = 1) 08/29/2024 10:19:44 DB Name "ORCL" Database flags = 0x00404001 0x00001200 Controlfile Creation Timestamp 08/29/2024 10:19:44 Incmplt recovery scn: 0x0000.00000000 Resetlogs scn: 0x0000.000e2006 Resetlogs Timestamp 08/29/2024 10:19:46 Prior resetlogs scn: 0x0000.00000001 Prior resetlogs Timestamp 08/24/2013 11:37:30 Redo Version: compatible=0xb200400 #Data files = 5, #Online files = 5 Database checkpoint: Thread=1 scn: 0x0000.0012f330 Threads: #Enabled=1, #Open=1, Head=1, Tail=1 enabled threads: 01000000 00000000 00000000 00000000 00000000 00000000 .....省略........ *************************************************************************** DATA FILE RECORDS *************************************************************************** (size = 520, compat size = 520, section max = 100, section in-use = 5, last-recid= 27, old-recno = 0, last-recno = 0) (extent = 1, blkno = 11, numrecs = 100) DATA FILE #1: name #7: /oradata/orcl/system01.dbf creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1 tablespace 0, index=1 krfil=1 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:168 scn: 0x0000.0012f330 09/09/2024 18:11:24 Stop scn: 0xffff.ffffffff 09/09/2024 17:57:17 --stop scn 无穷大 Creation Checkpointed at scn: 0x0000.00000007 08/24/2013 11:37:33 thread:0 rba:(0x0.0.0) enabled threads: 00000000 00000000 00000000 00000000 00000000 00000000 .....省略........ 此时datafile的stop scn是为空的。这里我们可以利用oracle 10013 event来进行观察数据库实例恢复的一个基本过程。 oradebug setmypid alter session set events '10013 trace name context forever, level 1'; alter database open; oradebug tracefile_name -- /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_86595.trc 最后我们来看下trace的内容吧: *** 2024-09-09 18:28:45.987 Oradebug command 'setmypid' console output: <none> *** 2024-09-09 18:28:51.544 Successfully allocated 3 recovery slaves Using 45 overflow buffers per recovery slave Thread 1 checkpoint: logseq 28, block 15443, scn 1241904 cache-low rba: logseq 28, block 15578 on-disk rba: logseq 28, block 17191, scn 1242506 --恢复的起点 start recovery at logseq 28, block 15578, scn 0 *** 2024-09-09 18:28:51.570 Started writing zeroblks thread 1 seq 28 blocks 17191-17198 *** 2024-09-09 18:28:51.570 Completed writing zeroblks thread 1 seq 28 ==== Redo read statistics for thread 1 ==== Total physical reads (from disk and memory): 4096Kb -- Redo read_disk statistics -- Read rate (ASYNC): 806Kb in 0.02s => 39.38 Mb/sec Longest record: 8Kb, moves: 0/869 (0%) Longest LWN: 507Kb, moves: 0/25 (0%), moved: 0Mb Last redo scn: 0x0000.0012f589 (1242505) ---------------------------------------------- *** 2024-09-09 18:28:51.581 KCRA: start recovery claims for 331 data blocks *** 2024-09-09 18:28:51.586 KCRA: blocks processed = 331/331, claimed = 331, eliminated = 0 *** 2024-09-09 18:28:51.587 Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0 *** 2024-09-09 18:28:51.589 Completed redo application of 0.73MB *** 2024-09-09 18:28:51.612 Completed recovery checkpoint ---alert 日志: alter database open Beginning crash recovery of 1 threads parallel recovery started with 3 processes Started redo scan Completed redo scan read 806 KB redo, 331 data blocks need recovery Started redo application at Thread 1: logseq 28, block 15578 Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0 Mem# 0: /oradata/orcl/redo01.log Completed redo application of 0.73MB Completed crash recovery at Thread 1: logseq 28, block 17191, scn 1262506 --实例恢复的终点 331 data blocks read, 331 data blocks written, 806 redo k-bytes read Mon Sep 09 18:28:51 2024 LGWR: STARTING ARCH PROCESSES Mon Sep 09 18:28:51 2024 ARC0 started with pid=23, OS id=86612 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thread 1 advanced to log sequence 29 (thread open) Mon Sep 09 18:28:52 2024 ARC1 started with pid=24, OS id=86621 Thread 1 opened at log sequence 29 Current log# 2 seq# 29 mem# 0: /oradata/orcl/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Mon Sep 09 18:28:52 2024 SMON: enabling cache recovery Mon Sep 09 18:28:52 2024 ARC2 started with pid=25, OS id=86623 Mon Sep 09 18:28:52 2024 ARC3 started with pid=26, OS id=86625 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Archived Log entry 5 added for thread 1 sequence 28 ID 0x65b05340 dest 1: [86595] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:36278834 end:36278854 diff:20 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Mon Sep 09 18:28:52 2024 QMNC started with pid=27, OS id=86629 Completed: alter database open 从上面结果也可以看出,是先进行介质恢复判断,再进行实例恢复判断。实例恢复的起点的on low rba(on-disk rba: logseq 28, block 17191, scn 1242506), 恢复的终点是redo log的最末端位置(scn 1262506)。
最后修改时间:2024-10-10 12:30:01
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论