说明:
在没有备份的情况下,经测试非CURRENT REDO LOG FILE丢失可以恢复,CURRENT REDO LOG FILE使用网上的方法无法恢复,初步怀疑新版本对REDO增加了更严格的校验,具体恢复方法还要咨询原厂,测试过程如下。
环境说明:
数据库版本
SQL> SELECT * FROM V$VERSION;LINEID BANNER---------- ---------------------------------1 DM Database Server 64 V82 DB Version: 0x7000c3 03134284172-20240321-222308-200934 Msg Version: 05 Gsu level(3-4) cnt: 36 Gsu level(5) cnt: 0
小版本:
SQL> SELECT BUILD_VERSION FROM V$INSTANCE;LINEID BUILD_VERSION---------- ------------------------------------1 1-3-140-2024.03.21-222308-20093-ENT
补丁版本:
SQL> SELECT ID_CODE;LINEID ID_CODE---------- -----------------------------------1 --03134284172-20240321-222308-20093
安装介质下载地址
https://www.dameng.com/list_103.html


创建测试数据
SQL> CREATE TABLE T1(ID INT);SQL> INSERT INTO T1 VALUES(1),(2),(3);SQL> COMMIT;
查询信息
查看REDO信息
SQL> SELECT GROUP_ID,FILE_ID,PATH,RLOG_SIZE FROM V$RLOGFILE;LINEID GROUP_ID FILE_ID PATH RLOG_SIZE---------- ----------- ----------- ----------------------- --------------------1 2 0 dm8/data/CJC/CJC01.log 2684354562 2 1 dm8/data/CJC/CJC02.log 268435456
SQL> SELECT CKPT_LSN,FILE_LSN,FLUSH_LSN,CUR_LSN,DB_MAGIC FROM V$RLOG;LINEID CKPT_LSN FILE_LSN FLUSH_LSN CUR_LSN DB_MAGIC---------- -------------------- -------------------- -------------------- -------------------- --------------------1 44644 44656 44656 44656 1516351486
新增数据
INSERT INTO T1 VALUES(4),(5);COMMIT;
SQL> SELECT CKPT_LSN,FILE_LSN,FLUSH_LSN,CUR_LSN,DB_MAGIC FROM V$RLOG;LINEID CKPT_LSN FILE_LSN FLUSH_LSN CUR_LSN DB_MAGIC---------- -------------------- -------------------- -------------------- -------------------- --------------------1 44644 44661 44661 44661 1516351486
V$ROG和V$RLOGFILE视图说明:
https://eco.dameng.com/document/dm/zh-cn/pm/dm8-admin-manual-appendix#6.%20%E6%97%A5%E5%BF%97%E7%AE%A1%E7%90%86
V$RLOG
显示日志的总体信息。通过该视图可以了解系统当前日志事务号 LSN 的情况、归档日志情况、检查点的执行情况等。

......
V$RLOGFILE
显示日志文件的具体信息。包括文件号、完整路径、文件的状态、文件大小等等。

查看REDO信息
SQL> host ls -lrth dm8/data/CJC/CJC*.log-rw-r--r-- 1 dmdba dinstall 256M May 11 21:45 dm8/data/CJC/CJC02.log-rw-r--r-- 1 dmdba dinstall 256M May 11 21:51 dm8/data/CJC/CJC01.log
可以看到,CJC01.log 时间较新,为CURRENT REDO;
模拟故障:
非current redo 丢失
SQL> host mv dm8/data/CJC/CJC02.log dm8/data/CJC/CJC02.log.bakSQL> shutdown abort;
启动
[dmdba@cjc-db-01 CJC]$ dmserver dm.ini
前台报错:
/dm8/data/CJC/CJC02.log not exist, can not startup
后台报错:
[dmdba@cjc-db-01 log]$ tail -10f dm_CJC_202405.log2024-05-11 22:00:21.283 [FATAL] database P0000008747 T0000000000000008747 dm8/data/CJC/CJC02.log not exist,can not startup
启动数据库,找不到CJC02.og,导致启动失败。
用现有的REDO冒充丢失的REDO
[dmdba@cjc-db-01 CJC]$ cp CJC01.log CJC02.log
启动
[dmdba@cjc-db-01 CJC]$ dmserver dm.ini
前台报错
rfil grp init log file dm8/data/CJC/CJC02.log error, code = -717
后台报错
2024-05-11 22:02:32.168 [INFO] database P0000008964 T0000000000000008964 rfil[/dm8/data/CJC/CJC01.log] ver: 7007, sta: 0, arch_lsn = 0, arch_seq = 0, clsn = 44660, clsn_fil = 0, clsn_off = 17547264, l_next_seq = 5058, g_next_seq = 5058, free = 17567744, rpkg_crc_stand = 12024-05-11 22:02:32.168 [ERROR] database P0000008964 T0000000000000008964 arch file(/dm8/data/CJC/CJC02.log) fil id check fail2024-05-11 22:02:32.168 [ERROR] database P0000008964 T0000000000000008964 rfil grp init log file /dm8/data/CJC/CJC02.log error, code = -7172024-05-11 22:02:32.168 [ERROR] database P0000008964 T0000000000000008964 rlog4_init_low->rfil_grp_alloc failed, alloc_only = 0!2024-05-11 22:02:32.168 [ERROR] database P0000008964 T0000000000000008964 hlog_upgrade failed, rlog4_sys_init return code[-126]
查看报错:fil id check fail,因为两个REDO fil id不能相同,所以报错。
arch file(/dm8/data/CJC/CJC02.log) fil id check fail
使用dmmdbf工具修改 fil id
dmmdbf工具是什么?
https://eco.dameng.com/document/dm/zh-cn/ops/troubleshooting-human
工具介绍
Dmmdf 工具:主要对 DM 相关的文件的属性进行修改。支持的修改的文件类型有:dbf、rlog、original bak、bakset meta、bakset bkp、bakset,根据文件的不同,可修改的属性也有所不同,最主要的是修改 db_magic。
查看帮助信息
[dmdba@cjc-db-01 CJC]$ dmmdf helpversion: 03134284172-20240321-222308-20093Format: ./dmmdf KEYWORD=valueExample: ./dmmdf TYPE=1 FILE=/opt/data/DAMENG/SYSTEM.DBFKeyword Explanation--------------------------------------------------------------------------------TYPE type1: for dbfbp_flag=0: Modify the normal instancebp_flag=1: Modify the BP instance of DPC2: for rlog3: for original bak4: for bakset meta5: for bakset bkp6: for bakset7: for database degradesscope=1: Only supports from 0x0007000A to 0x00070009, file is dm.ctl pathscope=2: Only supports from 0x0007000B to 0x0007000A, file is dm.ini pathscope=3: Only supports from 0x0007000C to 0x0007000B for single node database, file is dm.ini path8: for rlog degradesscope=1 Supports from 0x7006 to 0x7005scope=2 Supports from 0x7007 to 0x70069: for specified database upgrade(Only supports from V8.1.1.88(V7.1.7.88) to V8.1.1.100(V7.1.7.100)10: for updating db_magic of database, file is dm.ini path, db_magic is which update tobp_flag=0: Modify the normal instancebp_flag=1: Modify the BP instance of DPC, also need page_size to be setFILE file pathPAGE_SIZE for TYPE=10 BP_FLAG=1, the page size of the datafile, default 8192DCR_INI dmdcr.ini pathSCOPE scope for database degrades when type=7 or 8BP_FLAG whether to modify the BP instance of DPC when type=1 or type=10, default not(0)DB_MAGIC target db_magic when type=10HELP show this help info--------------------------------------------------Example: ./dmmdf TYPE=1 FILE=/opt/data/DAMENG/SYSTEM.dbf./dmmdf TYPE=2 FILE=/opt/data/DAMENG/DAMENG01.log./dmmdf TYPE=3 FILE=/opt/data/DAMENG/bak/dmdb.bak./dmmdf TYPE=4 FILE=/opt/data/DAMENG/bak/bset/nbak.meta./dmmdf TYPE=5 FILE=/opt/data/DAMENG/bak/bset/nbak.bak./dmmdf TYPE=6 FILE=/opt/data/DAMENG/bak/bset./dmmdf TYPE=7 SCOPE=1 FILE=/opt/data/DAMENG/dm.ctl./dmmdf TYPE=7 SCOPE=2 FILE=/opt/data/DAMENG/dm.ini./dmmdf TYPE=7 SCOPE=3 FILE=/opt/data/DAMENG/dm.ini./dmmdf TYPE=8 SCOPE=1 FILE=/opt/data/DAMENG/dm.ini./dmmdf TYPE=8 SCOPE=2 FILE=/opt/data/DAMENG/dm.ini./dmmdf TYPE=9 FILE=/opt/data/DAMENG/dm.ini./dmmdf TYPE=10 FILE=/opt/data/DAMENG/dm.ini DB_MAGIC=123456
修改CJC01.log,将13 fil_id = 0改成13 fil_id = 1
[dmdba@cjc-db-01 CJC]$ dmmdf TYPE=2 FILE=/dm8/data/CJC/CJC01.logdmmdf V8**********************************************************1 sig = DMRLOG2 ver = 70073 chksum = 9690062254 sta = 05 n_magic = 76 db_magic = 15163514867 len = 2684354568 free = 175677449 clsn = 4466010 clsn_fil = 011 clsn_off = 1754726412 pemnt_magic = 151088083213 fil_id = 015 next_seq = 505816 g_next_seq = 505817 arch_lsn = 018 arch_seq = 019 dbversion = 0x7000c20 min_exec_version = V8.1.1.121 min_dct_version = 422 p_db_magic = 023 n_apply_ep = 024 apply_info_lsn = 0pkg_seq_arr: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)apply_lsn_arr: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)25 recv_p_db_magic = 026 recv_n_apply_ep = 0recv_pkg_seq_arr: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)recv_apply_lsn_arr: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)27 l_term_id = 028 term_id = 029 c_seqno = 505730 c_lsn = 4466131 rpkg_crc_stand = 1**********************************************************You can only reset sta(4) or db_magic (6) or clsn (9) or clsn_fil(10) or clsn_off(11) or pemnt_magic(12) or fil_id(13) or next_seq(15) or g_next_seq(16) or p_db_magic(22) or n_apply_ep(23).Please input the num which one you want to change, q to quit: 13Input the new value: 1**********************************************************
再次启动:
[dmdba@cjc-db-01 CJC]$ dmserver dm.ini
启动成功
ndct_db_load_info finished, code:0.nsvr_process_before_open begin.nsvr_process_before_open success.SYSTEM IS READY.
登录数据库,检查
[dmdba@cjc-db-01 CJC]$ disql SYSDBA/CJC123456:5238Server[LOCALHOST:5238]:mode is normal, state is openlogin used time : 15.570(ms)disql V8SQL> SELECT * FROM T1;LINEID ID---------- -----------1 12 23 34 45 5used time: 1.535(ms). Execute id is 501.SQL> INSERT INTO T1 VALUES(6);affect rows 1used time: 1.171(ms). Execute id is 502.SQL> COMMIT;executed successfullyused time: 8.523(ms). Execute id is 503.
current redo 丢失
方法一:失败
按照之前的方法,将另一个redo复制为丢失的redo,使用dmmdf工具只修改fil_id仍然无法启动数据库,
继续修改其他几个值,还是无法启动数据库
sta(4)clsn (9) 44727clsn_fil(10)clsn_off(11) 17592320fil_id(13) 0next_seq(15) or 5068g_next_seq(16) or 5068
方法二:失败
参考https://eco.dameng.com/解决方案,其他机器上重新创建一个相同名称、相同配置的实例,干净关库后考走redo文件到原库,手动修改db_magic等,无法解决问题。
报错如下:
2024-05-11 22:26:31.493 [INFO] database P0000010953 T0000000000000010953 main rfil[/dm8/data/CJC/CJC01.log]'s sta is active2024-05-11 22:26:31.493 [ERROR] database P0000010953 T0000000000000010953 redo_rfil_grp_pwr_collect->rlog4_first_rpkg_seqno_get failed, code:-7232024-05-11 22:26:31.494 [INFO] database P0000010953 T0000000000000010953 main rfil [/dm8/data/CJC/CJC01.log]'s grp collect 0 valid pwr record, discard 0 invalid pwr record
初步怀疑新版本对REDO增加了更严格的校验,具体恢复方法还要咨询原厂,有知道解决方法的欢迎留言。
参考链接:
https://eco.dameng.com/community/article/a702397bf5cb0aaf0d018e9cb1bee166https://eco.dameng.com/community/article/1ffd251e3b2e82b08e0faf0eae9e345chttp://www.killdb.com/2021/08/27/%e8%be%be%e6%a2%a6%e6%95%b0%e6%8d%ae%e5%ba%93%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%e4%b9%8b-current-redo%e8%af%af%e5%88%a0%e9%99%a4%e6%81%a2%e5%a4%8d/
###chenjuchao 20240512###
欢迎关注我的公众号《IT小Chen》





