Oracle异常恢复实战第5讲 - Oracle的open过程_笔记

四九年入国军 2024-09-09
142
  感兴趣的还是记得去看原帖子，笔记有删减，原作者微信公众号：
在这节课中，我们仍然通过各种trace手段来跟踪相关进程，然后通过实验过程来深入了解open的过程.；首先我们做一些准备工作。启动几个不同的Session窗口。


--Session 1
SQL> startup mount
ORACLE instance started.

Total System Global Area 2505338880 bytes
Fixed Size                  2255832 bytes
Variable Size             620758056 bytes
Database Buffers         1862270976 bytes
Redo Buffers               20054016 bytes
Database mounted.
SQL> !ps -ef|grep -v grep|grep ora_
oracle    75874      1  0 17:01 ?        00:00:00 ora_pmon_orcl
oracle    75876      1  0 17:01 ?        00:00:00 ora_psp0_orcl
oracle    75879      1  0 17:01 ?        00:00:00 ora_vktm_orcl
oracle    75883      1  0 17:01 ?        00:00:00 ora_gen0_orcl
oracle    75885      1  0 17:01 ?        00:00:00 ora_diag_orcl
oracle    75887      1  0 17:01 ?        00:00:00 ora_dbrm_orcl
oracle    75889      1  0 17:01 ?        00:00:00 ora_dia0_orcl
oracle    75891      1  1 17:01 ?        00:00:00 ora_mman_orcl
oracle    75893      1  0 17:01 ?        00:00:00 ora_dbw0_orcl
oracle    75895      1  0 17:01 ?        00:00:00 ora_lgwr_orcl
oracle    75897      1  0 17:01 ?        00:00:00 ora_ckpt_orcl
oracle    75899      1  0 17:01 ?        00:00:00 ora_smon_orcl
oracle    75901      1  0 17:01 ?        00:00:00 ora_reco_orcl
oracle    75903      1  0 17:01 ?        00:00:00 ora_mmon_orcl
oracle    75905      1  0 17:01 ?        00:00:00 ora_mmnl_orcl
oracle    75907      1  0 17:01 ?        00:00:00 ora_d000_orcl
oracle    75909      1  0 17:01 ?        00:00:00 ora_s000_orcl


--session 2:监控smon进程
strace -fr -o /tmp/75899.log -p 75899

--session 3：监控dbwr进程
strace -fr -o /tmp/75893.log -p 75893

--Session 4
SQL> oradebug setmypid
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> alter database open;

Database altered.

SQL> oradebug close_trace
Statement processed.
SQL> oradebug tracefile_name
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_75927.trc
SQL> 

 大家可以看到，我们在open数据库之前打开了数据库10046 trace跟踪，同时也进行了strace跟踪；其中strace跟踪我们重点观察smon和dbw进程。为什么要跟踪smon进程？
  因为大家都知道数据库open的时候，smon进程要进行实例恢复，它需要做很多事儿。当然单纯的strace其实并不能完全理解smon进程的作用，这里我们不多讲，后面会讲其他的event来跟踪。
 
--smon trc:/tmp/75899.log 

cd /proc/75899/fd
ls -l 
lrwx------ 1 oracle oinstall 64 9月   9 17:03 256 -> /oradata/orcl/system01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:03 257 -> /oradata/orcl/sysaux01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:03 258 -> /oradata/orcl/undotbs01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:03 259 -> /oradata/orcl/users01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:03 260 -> /oradata/orcl/test01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:03 261 -> /oradata/orcl/temp01.dbf


cat /tmp/75899.log  | grep pread
cat /tmp/75899.log  | grep pread
75899      0.000240 pread64(261, "\36\242\0\0\3\0@\0\305y\16\0\0\0\0\f\253\277\0\0\1\0\0\0\200\0\0\0\1\0\0\0"..., 8192, 24576) = 8192
75899      0.000156 pread64(256, "\20\242\0\0\300\0@\0\"\247\16\0\0\0\1\4\351\373\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 1572864) = 8192
75899      0.000285 pread64(258, "\2\242\0\0\363 \300\0\373\353\22\0\0\0\1\4\316\302\0\0\1\0\34\0\255\2\0\0\246\0\"\""..., 8192, 69099520) = 8192
75899      0.000476 pread64(257, "\35\242\0\0\2\0\200\0\366\352\22\0\0\0\2\4\23A\0\0\2\0\0\0\10\0\0\0\0\t\1\0"..., 8192, 16384) = 8192

从smon进程的strace 来看，正常情况下,先后读取了临时表空间、系统表空间、undo表空间、sysaux表空间。接下来看下dbw进程的strace log：

--dbw0 trc

lrwx------ 1 oracle oinstall 64 9月   9 17:07 263 -> /oradata/orcl/temp01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 262 -> /oradata/orcl/test01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 261 -> /oradata/orcl/users01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 260 -> /oradata/orcl/undotbs01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 259 -> /oradata/orcl/sysaux01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 258 -> /oradata/orcl/system01.dbf
lrwx------ 1 oracle oinstall 64 9月   9 17:07 257 -> /oradata/orcl/control02.ctl
lrwx------ 1 oracle oinstall 64 9月   9 17:07 256 -> /oradata/orcl/control01.ctl


cat /tmp/75893.log | grep read
75893      0.444386 pread64(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4/\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384
75893      0.000137 pread64(257, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4/\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384
75893      0.000104 pread64(256, "\25\302\0\0\17\0\0\0\270\5\0\0\377\377\1\4\345\355\0\0\0\2\0\0\0\0\0\0\0\0\0\301"..., 16384, 245760) = 16384
75893      0.000116 pread64(258, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\210\0\0\0 \0\0\0r\1\0}|{z"..., 8192, 0) = 8192
75893      0.000106 pread64(258, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\210\0\0\0 \0\0\0r\1\0}|{z"..., 8192, 0) = 8192
75893      0.000097 pread64(259, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\363\0\0\0 \0\0\0\t\1\0}|{z"..., 8192, 0) = 8192
75893      0.000110 pread64(259, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0g\363\0\0\0 \0\0\0\t\1\0}|{z"..., 8192, 0) = 8192
75893      0.000105 pread64(260, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\331\0\0\0 \0\0\0#\0\0}|{z"..., 8192, 0) = 8192
75893      0.000103 pread64(260, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\331\0\0\0 \0\0\0#\0\0}|{z"..., 8192, 0) = 8192
75893      0.000102 pread64(261, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\370\0\0\0 \0\0\200\2\0\0}|{z"..., 8192, 0) = 8192
75893      0.000101 pread64(261, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\370\0\0\0 \0\0\200\2\0\0}|{z"..., 8192, 0) = 8192
75893      0.000102 pread64(262, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\310\0\0\0 \0\0\0002\0\0}|{z"..., 8192, 0) = 8192
75893      0.000108 pread64(262, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0f\310\0\0\0 \0\0\0002\0\0}|{z"..., 8192, 0) = 8192
75893      0.000115 pread64(256, "\25\302\0\0\265\0\0\0z\2\0\0\377\377\1\4}\227\0\0\0\n\0\0\0 \0\0\36\0\10\0"..., 16384, 2965504) = 16384
75893      0.000103 pread64(263, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\364\0\0\0 \0\0\200\16\0\0}|{z"..., 8192, 0) = 8192
75893      0.000101 pread64(263, "\0\242\0\0\0\0\300\377\0\0\0\0\0\0\0\0\346\364\0\0\0 \0\0\200\16\0\0}|{z"..., 8192, 0) = 8192
75893      0.002218 pread64(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4-\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384
75893      0.000172 pread64(257, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4-\0\0\0\0\0\0\0\0\4 \v@T\260e"..., 16384, 16384) = 16384
75893      0.000110 pread64(256, "\25\302\0\0\17\0\0\0\272\5\0\0\377\377\1\4\345\345\0\0\0\2\0\0\0\0\0\0\0\0\0\301"..., 16384, 245760) = 16384
75893      0.000117 pread64(256, "\25\302\0\0\21\0\0\0\272\5\0\0\377\377\1\4\375\246\0\0\0\0\0\0\0\0\0\0\300\10;F"..., 16384, 278528) = 16384
75893      0.000116 pread64(256, "\25\302\0\0\27\0\0\0\261\5\0\0\377\377\1\4\234]\0\0\0\0\0\0\0 \0\0\16\0\0\0"..., 16384, 376832) = 16384
75893      0.000155 pread64(258, "\v\242\0\0\1\0@\0\0\0\0\0\0\0\1\4\323[\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192
75893      0.000254 pread64(259, "\v\242\0\0\1\0\200\0\0\0\0\0\0\0\1\4\24e\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192
75893      0.000213 pread64(260, "\v\242\0\0\1\0\300\0\0\0\0\0\0\0\1\4\264k\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192
75893      0.000140 pread64(261, "\v\242\0\0\1\0\0\1\0\0\0\0\0\0\1\4Q\t\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192
75893      0.000121 pread64(262, "\v\242\0\0\1\0@\1\0\0\0\0\0\0\1\4C\313\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192
75893      0.000185 pread64(256, "\25\302\0\0\265\0\0\0z\2\0\0\377\377\1\4}\227\0\0\0\n\0\0\0 \0\0\36\0\10\0"..., 16384, 2965504) = 16384
75893      0.000341 pread64(263, "\v\242\0\0\1\0@\0\0\0\0\0\0\0\1\0041\332\0\0\0\0\0\0\0\4 \v@T\260e"..., 8192, 8192) = 8192

从上面strace跟踪smon和dbwr进程，我们可以得到什么结论呢 ？
1） 在open过程中smon进程会读取temp、system、undo、sysaux表空间
2)  在open过程中dbwr进程会读取controlfile和数据文件


接下来我们看看更为重要的10046 trace log，需要注意的是，我们在进行数据恢复过程中，经常使用10046 trace，因此大家务必要熟练使用10046 event。

WAIT #46912550775824: nam='rdbms ipc reply' ela= 1020831 from_process=11 timeout=1800 p3=0 obj#=-1 tim=1725872581816334
WAIT #46912550775824: nam='control file sequential read' ela= 7 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816510
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816522
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816529
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816536
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816547
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816553
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816558
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816564
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816576
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816586
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816596
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816606
WAIT #46912550775824: nam='log file sync' ela= 190 buffer#=18888 sync scn=1240070 p3=0 obj#=-1 tim=1725872581816885
WAIT #46912550775824: nam='control file sequential read' ela= 4 file#=0 block#=1 blocks=1 obj#=-1 tim=1725872581816921
WAIT #46912550775824: nam='control file sequential read' ela= 3 file#=0 block#=15 blocks=1 obj#=-1 tim=1725872581816941
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=17 blocks=1 obj#=-1 tim=1725872581816947
WAIT #46912550775824: nam='control file sequential read' ela= 2 file#=0 block#=282 blocks=1 obj#=-1 tim=1725872581816954
WAIT #46912550775824: nam='instance state change' ela= 430 layer=2 value=1 waited=1 obj#=-1 tim=1725872581817413
WAIT #46912550775824: nam='db file sequential read' ela= 16 file#=1 block#=520 blocks=1 obj#=-1 tim=1725872581817649

从10046 trace，我们可以看到在读取controlfile的头部block，然后再读取其他的block。在倒数第7行，大家注意，有个sync scn=1240070，说明在open的时候会去比较scn。
另外在最后一行，不知道大家是否注意到file 1 block 520  ？这个其实是比较特殊的。


select owner,segment_name,HEADER_FILE,HEADER_BLOCK   from dba_Segments where segment_name='BOOTSTRAP$';
OWNER                SEGMENT_NAME         HEADER_FILE HEADER_BLOCK
-------------------- -------------------- ----------- ------------
SYS                  BOOTSTRAP$                     1          520

  是的。file 1 block 520. 这个就是我们的 BOOTSTRAP$. 看名字，我相信大家都知道，这个是很关键，很特殊的一个对象。
 也就是说数据库在open的时候，首先需要初始化数据字典，那么这些数据字典信息从哪里来呢？是的。就是从boostrap$中读取并创建。
 
 从后面的trace内容，我们也可以发现，会接着创建bootstrap$对象,如下：
 
PARSING IN CURSOR #46912550767600 len=188 dep=1 uid=0 oct=1 lid=0 tim=1725872581818241 hv=4006182593 ad='f4871170' sqlid='32r4f1brckzq1'
create table bootstrap$ (
END OF STMT
PARSE #46912550767600:c=493,e=493,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1725872581818240
EXEC #46912550767600:c=137,e=138,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=1725872581818419
CLOSE #46912550767600:c=1,e=2,dep=1,type=0,tim=1725872581818455
=====================
PARSING IN CURSOR #46912550767600 len=55 dep=1 uid=0 oct=3 lid=0 tim=1725872581818728 hv=2111436465 ad='f486f7d8' sqlid='6apq2rjyxmxpj'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #46912550767600:c=267,e=267,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1725872581818727
BINDS #46912550767600:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=2aaaadec87a8  bln=22  avl=02  flg=05
  value=59
EXEC #46912550767600:c=551,e=550,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=1725872581819318
WAIT #46912550767600: nam='db file sequential read' ela= 15 file#=1 block#=520 blocks=1 obj#=59 tim=1725872581819418


从上面的 value=59，我们可以知道bootstrap$核心对象个数是59个。也就是后面会创建59个bootstrap 核心对象

到最后，我们似乎仍然没有通过strace和10046 得出oracle在open的过程中，如何去对比scn以及checkpoint cnt值的。这里我们换个思路，使用bbed来改一改，观察一下。


下面我们仍然通过实验来进行探索：   
首先我们需要知道，checkpoint count存在什么地方（后面我们会讲数据文件头结构）：


BBED> set dba  2,1
        DBA             0x00800001 (8388609 2,1)

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x000000a4       --表示data checkpoint count

BBED> p kcvfhccc
ub4 kcvfhccc                                @148      0x000000a3       --表示controlfile checkpoint count

BBED> 


 这里大家记住这2个偏移量140和148.
 我们来修改datafile checkpoint count值，观察一下Oracle在open时，会是什么情况？
 
--##场景1 模拟datafile checkpoint count值不一致
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2505338880 bytes
Fixed Size                  2255832 bytes
Variable Size             620758056 bytes
Database Buffers         1862270976 bytes
Redo Buffers               20054016 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> alter session set events 'immediate trace name controlf level 3';

Session altered.

SQL> oradebug tracefile_name
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_80055.trc
SQL> 


--控制文件 数据文件trc ：
DATA FILE #2:
  name #6: /oradata/orcl/sysaux01.dbf
creation size=0 block size=8192 status=0xe head=6 tail=6 dup=1
 tablespace 1, index=2 krfil=2 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:165 scn: 0x0000.0012f0ed 09/09/2024 17:34:11
 Stop scn: 0x0000.0012f0ed 09/09/2024 17:34:11
 Creation Checkpointed at scn:  0x0000.0000072a 08/24/2013 11:37:37
 thread:0 rba:(0x0.0.0)
 
 
 同时，我们再dump 下datafile header信息：
 oradebug setmypid
 alter session set events 'immediate trace name file_hdrs level 3';
 oradebug tracefile_name
 /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_80469.trc
 
--数据文件头trc：
 DATA FILE #2:
  name #6: /oradata/orcl/sysaux01.dbf
creation size=0 block size=8192 status=0xe head=6 tail=6 dup=1
 tablespace 1, index=2 krfil=2 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:165 scn: 0x0000.0012f0ed 09/09/2024 17:34:11
 Stop scn: 0x0000.0012f0ed 09/09/2024 17:34:11
 Creation Checkpointed at scn:  0x0000.0000072a 08/24/2013 11:37:37
 thread:0 rba:(0x0.0.0)
 
 
 我们可以看到，controlfile中datafile 2的checkpoint cnt值和datafile header中的值是一样的，均为165.  
  现在我们手工修改下datafile header中的checkpoint cnt值，然后来观察下数据库open会是什么情况？
 
 BBED> set file 2 block 1
        FILE#           2
        BLOCK#          1

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x000000a5    --a5 10进制是165

BBED>  d /v count 64
 File: /oradata/orcl/sysaux01.dbf (2)
 Block: 1       Offsets:  140 to  203  Dba:0x00800001
-------------------------------------------------------
 a5000000 a5354446 a4000000 00000000 l .....5DF........
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................
 00000000 00000000 00000000 00000000 l ................

 <16 bytes per line>


BBED> modify /x a4 offset 140
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y
 File: /oradata/orcl/sysaux01.dbf (2)
 Block: 1                Offsets:  140 to  203           Dba:0x00800001
------------------------------------------------------------------------
 a4000000 a5354446 a4000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>


BBED> sum apply
Check value for File 2, Block 1:
current = 0x7334, required = 0x7334

BBED> 

我们这里修改了offset 140，将a5改成了a4，也就是说把该数据文件的checkpoint count值改小了1，然后来启动数据库看看情况。
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 2 needs media recovery
ORA-01110: data file 2: '/oradata/orcl/sysaux01.dbf'



我们可以看到，此时启动数据库，Oracle会提示这个数据文件需要进行介质恢复

如果是datafile checkpoint cnt值大于controlfile中的datafile checkpoint cnt值呢？

BBED> modify /x af dba 2,1 offset 140
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y
 File: /oradata/orcl/sysaux01.dbf (2)
 Block: 1                Offsets:  140 to  651           Dba:0x00800001
------------------------------------------------------------------------
 af000000 a5354446 a4000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 01000000 06005359 53415558 00000000 00000000 00000000 00000000 00000000 
 00000000 02000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 7ac92131 01000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 edf01200 00000000 
 13ef4946 01000000 1c000000 ba3a0000 100051f5 02000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 

 <32 bytes per line>

BBED> sum apply
Check value for File 2, Block 1:
current = 0x733f, required = 0x733f

BBED> 


我们将offset 140的a4改成af，也就是改大了一些。来看看数据库open的情况。

SQL> alter database open;

Database altered.

SQL> 


从上面的实验，我们可以得出一个结论：   
1、在open的时候，oracle会去比较controlfile中的datafile checkpoint count值与datafile header 中的checkpoint count值进行比较。   
2、当datafile header中的checkpoint count值小于controlfile 中的datafile checkpoint count值时，会提示该文件需要进行恢复。  
3、当datafile header中的checkpoint count值大于等于controlfile 中的datafile checkpoint count值时，可以正常open数据库。


下面我们继续来实验，通过修改scn来观察open的时候，oracle对于scn的校验机制！

--##场景2  datafile checkpoint scn(start scn）不一致 controlfile中stop scn  
--先启动库到mount状态
BBED> set dba 1,1
        DBA             0x00400001 (4194305 1,1)

BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0012f32d
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x4649f47d
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000001c
         ub4 kcrbabno                       @504      0x00003c53
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00

BBED> set dba 2,1
        DBA             0x00800001 (8388609 2,1)

BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0012f32d
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x4649f47d
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000001c
         ub4 kcrbabno                       @504      0x00003c53
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00

BBED> 

接下来此时我们做个控制文件的dump：
SQL> oradebug setmypid
Statement processed.
SQL> alter session set events 'immediate trace name controlf level 3';

Session altered.

SQL>  oradebug close_Trace
Statement processed.
SQL> oradebug tracefile_name
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_82846.trc
SQL> 

此时控制文件dump的内容，我们选择其中一段，看下system数据文件 的情况：

DATA FILE RECORDS
***************************************************************************
 (size = 520, compat size = 520, section max = 100, section in-use = 5,
  last-recid= 27, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
  name #7: /oradata/orcl/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:167 scn: 0x0000.0012f32d 09/09/2024 17:57:17
 Stop scn: 0x0000.0012f32d 09/09/2024 17:57:17
 Creation Checkpointed at scn:  0x0000.00000007 08/24/2013 11:37:33
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
 
--可以看到此时datafile checkpoint scn(start scn)等于controlfile中的stop scn的。
 
 首先模拟datafile checkpoint scn比控制文件stop scn要小的情况：

 
 
BBED> d /v  dba 2,1 offset 484 count 10
 File: /oradata/orcl/sysaux01.dbf (2)
 Block: 1       Offsets:  484 to  493  Dba:0x00800001
-------------------------------------------------------
 2df31200 00000000 7df4              l -.......}.                                    --0x0012f32d

 <16 bytes per line>

BBED> modify /x 2c dba 2,1  offset 484
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y
 File: /oradata/orcl/sysaux01.dbf (2)
 Block: 1                Offsets:  484 to  995           Dba:0x00800001
------------------------------------------------------------------------
 2cf31200 00000000 7df44946 01000000 1c000000 533c0000 100051f5 02000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 0d000d00 0d000100 00000000 00000000 00000000 02008000 f21e0e00 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>


BBED> sum apply
Check value for File 2, Block 1:
current = 0x6d4f, required = 0x6d4f

BBED> set dba 2,1
        DBA             0x00800001 (8388609 2,1)

BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0012f32c              --原来--0x0012f32d
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x4649f47d
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000001c
         ub4 kcrbabno                       @504      0x00003c53
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00


这里我们认为将数据文件的检查点scn改小了一点点（2d改成了2c），重试open库
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [3619], [1], [0], [], [], [], [],
[], [], [], [], []


SQL> 

SQL> select file#,checkpoint_change# from v$datafile order by 1;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            1241901
         2            1241901
         3            1241901
         4            1241901
         5            1241901

SQL> select file#,checkpoint_change# from v$datafile_header order by 1;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            1241901
         2            1241900
         3            1241901
         4            1241901
         5            1241901

SQL> 

可以看到此时直接启动数据库会报错，提示需要进行恢复。

SQL> recover  database;
Media recovery complete.
SQL>  select file#,checkpoint_change# from v$datafile_header order by 1;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            1241901
         2            1241901
         3            1241901
         4            1241901
         5            1241901

SQL> select file#,checkpoint_change# from v$datafile order by 1;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1            1241901
         2            1241901
         3            1241901
         4            1241901
         5            1241901

SQL> alter database open;

Database altered.

--手工进行recover database之后，我们会发现文件头的scn增加了，跟其他文件一致了，当然这个时候肯定是可以正常open的了。



那么如果是datafile checkpoint scn(start scn）大于 controlfile中stop scn的情况呢 ？    

我这里测试是可以直接打开数据库的，没有任何影响。到最后我们可以简单的总结下数据库open的时候的过程：

1) 首先比较controlfile中的datafile checkpoint cnt和datafile header中的checkpoint cnt值是否相等。    
      当datafile header中的checkpoint count值小于controlfile 中的datafile checkpoint count值时，会提示该文件需要进行恢复。  
	  当datafile header中的checkpoint count值大于controlfile 中的datafile checkpoint count值时，可以正常open数据库。
2）当checkpoint cnt值相等之后，会比较datafile header中的start scn(checkpoint scn)与controlfile中的datafile checkpoint scn，以及database(system) checkpoint scn是否一致。
      当datafile header中的checkpoint scn(start scn)小于system scn或controlfile中的对应的datafile checkpoint scn时，会提示需要进行介质恢复。
      当datafile header中的checkpoint scn(start scn)大于system scn或controlfile中的对应的datafile checkpoint scn时，可以正常open数据库。   
3）当controlfile中datafile的stop scn为空时，会提示需要进行实例恢复，如下:
--session 1
SQL> drop table scott.t1;

Table dropped.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2505338880 bytes
Fixed Size                  2255832 bytes
Variable Size             620758056 bytes
Database Buffers         1862270976 bytes
Redo Buffers               20054016 bytes
Database mounted.


SQL> oradebug setmypid
Statement processed.
SQL> alter session set events 'immediate trace name controlf level 3';

Session altered.

SQL> oradebug close_trace
Statement processed.
SQL> oradebug tracefile_name
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_86043.trc
SQL> 


此时的trace文件内容如下：

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 08/29/2024 10:19:44
 DB Name "ORCL"
 Database flags = 0x00404001 0x00001200
 Controlfile Creation Timestamp  08/29/2024 10:19:44
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.000e2006 Resetlogs Timestamp  08/29/2024 10:19:46
 Prior resetlogs scn: 0x0000.00000001 Prior resetlogs Timestamp  08/24/2013 11:37:30
 Redo Version: compatible=0xb200400
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.0012f330
 Threads: #Enabled=1, #Open=1, Head=1, Tail=1
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
 .....省略........
 
 
 
***************************************************************************
DATA FILE RECORDS
***************************************************************************
 (size = 520, compat size = 520, section max = 100, section in-use = 5,
  last-recid= 27, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
  name #7: /oradata/orcl/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:168 scn: 0x0000.0012f330 09/09/2024 18:11:24
 Stop scn: 0xffff.ffffffff 09/09/2024 17:57:17                                  --stop scn 无穷大
 Creation Checkpointed at scn:  0x0000.00000007 08/24/2013 11:37:33
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
 .....省略........
  
 此时datafile的stop scn是为空的。这里我们可以利用oracle 10013 event来进行观察数据库实例恢复的一个基本过程。
 

oradebug setmypid
alter session set events '10013 trace name context forever, level 1';
alter database open;
oradebug tracefile_name
-- /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_86595.trc

最后我们来看下trace的内容吧：

*** 2024-09-09 18:28:45.987
Oradebug command 'setmypid' console output: <none>

*** 2024-09-09 18:28:51.544
Successfully allocated 3 recovery slaves
Using 45 overflow buffers per recovery slave
Thread 1 checkpoint: logseq 28, block 15443, scn 1241904
  cache-low rba: logseq 28, block 15578
    on-disk rba: logseq 28, block 17191, scn 1242506                 --恢复的起点
  start recovery at logseq 28, block 15578, scn 0

*** 2024-09-09 18:28:51.570
Started writing zeroblks thread 1 seq 28 blocks 17191-17198

*** 2024-09-09 18:28:51.570
Completed writing zeroblks thread 1 seq 28
==== Redo read statistics for thread 1 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 806Kb in 0.02s => 39.38 Mb/sec
Longest record: 8Kb, moves: 0/869 (0%)
Longest LWN: 507Kb, moves: 0/25 (0%), moved: 0Mb
Last redo scn: 0x0000.0012f589 (1242505)
----------------------------------------------

*** 2024-09-09 18:28:51.581
KCRA: start recovery claims for 331 data blocks

*** 2024-09-09 18:28:51.586
KCRA: blocks processed = 331/331, claimed = 331, eliminated = 0

*** 2024-09-09 18:28:51.587
Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0

*** 2024-09-09 18:28:51.589
Completed redo application of 0.73MB

*** 2024-09-09 18:28:51.612
Completed recovery checkpoint

---alert 日志：
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 806 KB redo, 331 data blocks need recovery
Started redo application at
 Thread 1: logseq 28, block 15578
Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0
  Mem# 0: /oradata/orcl/redo01.log
Completed redo application of 0.73MB
Completed crash recovery at
 Thread 1: logseq 28, block 17191, scn 1262506                            --实例恢复的终点
 331 data blocks read, 331 data blocks written, 806 redo k-bytes read
Mon Sep 09 18:28:51 2024
LGWR: STARTING ARCH PROCESSES
Mon Sep 09 18:28:51 2024
ARC0 started with pid=23, OS id=86612 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 29 (thread open)
Mon Sep 09 18:28:52 2024
ARC1 started with pid=24, OS id=86621 
Thread 1 opened at log sequence 29
  Current log# 2 seq# 29 mem# 0: /oradata/orcl/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Sep 09 18:28:52 2024
SMON: enabling cache recovery
Mon Sep 09 18:28:52 2024
ARC2 started with pid=25, OS id=86623 
Mon Sep 09 18:28:52 2024
ARC3 started with pid=26, OS id=86625 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Archived Log entry 5 added for thread 1 sequence 28 ID 0x65b05340 dest 1:
[86595] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:36278834 end:36278854 diff:20 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Mon Sep 09 18:28:52 2024
QMNC started with pid=27, OS id=86629 
Completed: alter database open






        从上面结果也可以看出，是先进行介质恢复判断，再进行实例恢复判断。实例恢复的起点的on low rba(on-disk rba: logseq 28, block 17191, scn 1242506)，
    恢复的终点是redo log的最末端位置(scn 1262506)。
oracle 墨力计划
最后修改时间：2024-10-10 12:30:01
「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」
关注作者
Oracle异常恢复实战第5讲 - Oracle的open过程_笔记

评论