Oracle数据库使用DG/OGG为什么开启Force logging

原创杨卓 2021-12-15

7174

一、疑问

Oracle数据库使用DG、OGG为什么开启Force logging

二、学习测试

备注说明：
本次的文章中关于DG的描述部分基于刚入行的时候导师李会亮和谢老大处理的一个DG问题,安排进行学习测试；
OGG的测试部分是由公司同事邓梓龙、黄美斌进行的测试现象，我这边进行学习总结。

2.1 DG

2.1.1 DG报错

ORA-01578: ORACLE 数据块损坏 (文件号 4, 块号 293465)
ORA-01110: 数据文件 4: ‘xxxx’

> !oerr ora 01578
01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)"
// *Cause:  The data block indicated was corrupted, mostly due to software
//          errors.
// *Action: Try to restore the segment containing the block indicated. This
//          may involve dropping the segment and recreating it. If there
//          is a trace file, report the errors in it to your ORACLE
//          representative.

问题整体描述：客户DBA发现DG报错,出现文件坏块现象！排查发现是由于主库没有开启force logging,并且开发在主库使用了nologging的方式写入数据，因此在DG备库恢复时由于使用归档日志无法进行正常recover 导致报错！提示数据块已损坏,无法正常读写数据！解决办法是主库开启force logging,重新搭建DG。

如下进行问题的模拟！！！可以重现这个问题！

2.1.2 问题模拟

1）RMAN全备

run{
allocate channel c1 type disk;
allocate channel c2 type disk;
sql'alter system switch logfile';
crosscheck archivelog all;
delete noprompt expired archivelog all;
backup incremental level 0 database format '/u01/rman/%U.bk'
 include current controlfile plus archivelog delete input
 format '/u01/rman/arch%U.bk';
delete noprompt obsolete;
}

2）数据库关闭force_logging

> select FORCE_LOGGING from v$database;
FOR
---
YES
> alter database no force logging
#确认关闭force logging模式
> select FORCE_LOGGING from v$database;
FOR
---
NO

3）创建常规表

> create table scott.c1 as select * from dba_objects where 1=2;
#insert append+nologging方式
> insert /*+ append */ into scott.c1 nologging select * from dba_objects;
> insert /*+ append */ into scott.c1 select * from dba_objects nologging;  
【测试不同nlogging位置】
> select file#,checkpoint_change#,checkpoint_time,unrecoverable_change#,
unrecoverable_time from v$datafile where file#=4;
FILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME  UNRECOVERABLE_CHANGE# UNRECOVERABLE_TIME
----- ------------------ ------------------- --------------------- ----------
4            5779637 2018-04-14 11:35:44       5778976 2018-04-14 11:28:19
#数据并无任何改变！！！--上述查询在操作之前，操作之后多次查询
【insert + append +nologging方式无法使用nologging属性,在当前场景下！】

4）修改表REDO记录为nologging模式

> alter table c1 nologging;

5)insert append table nologging

> insert /*+ append */ into scott.c1 select * from dba_objects;
87175 rows created.
> commit;
> select file#,checkpoint_change#,checkpoint_time,unrecoverable_change#,
unrecoverable_time from v$datafile where file#=4
FILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME  UNRECOVERABLE_CHANGE# UNRECOVERABLE_TIME
----- ------------------ ------------------- --------------------- ------------
    4            5779637 2018-04-14 11:35:44     5782237 2018-04-14 12:07:07
#小结：表的属性需要Nologging，无法语句级别使用，可以发现UNRECOVERABLE_CHANGE#发生了改变！

V$DATAFILE  --官方文档视图字段说明
UNRECOVERABLE_CHANGE#
    Last unrecoverable change number made to this datafile. 
If the database is in ARCHIVELOG mode, then this column is updated when
 an unrecoverable operation completes. If the database is not in ARCHIVELOG mode,
 this column does not get updated.
UNRECOVERABLE_TIME
   Timestamp of the last unrecoverable change. This column is updated only 
if the database is in ARCHIVELOG mode.

6）创建Nologging方式索引

> create index scott.c1_ind on scott.c1(object_id) nologging;
> select index_name,LOGGING from user_indexes where index_name ='C1_IND';
INDEX_NAME                     LOG
------------------------------ ---
C1_IND                         NO

7）删除数据文件

select file_id,file_name from dba_data_files where tablespace_name='USERS'；
FILE_ID FILE_NAME
------- ---------------------------------------------
      4 /u01/app/oracle/oradata/ENMO/users01.dbf
> host rm -f /u01/app/oracle/oradata/ENMO/users01.dbf
> alter database datafile 4 offline;
> alter system switch logfile;

8）restore,recover

RMAN> 
 run{
 sql 'alter database datafile 4 offline';
 restore datafile 4;
 recover datafile 4;
 sql 'alter database datafile 4 online';
    }

9）验证数据对象

> select count(*) from c1;
select count(*) from c1
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 267547)
ORA-01110: data file 4: '/u01/app/oracle/oradata/ENMO/users01.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option
> select index_name,LOGGING,status from user_indexes where index_name ='C1_IND';
INDEX_NAME                     LOG STATUS
------------------------------ --- --------
C1_IND                         NO  VALID
#表都没了。。。数据都没了。。。索引还是有效的

小结：从本例的测试中，我们可以发现一个问题。
条件一、数据库在没有开启force logging的属性；
条件二、并且表的数据也不是默认的logging，修改为nologging；
条件三、加载数据使用Insert hint append方式加载数据;
条件四、RMAN数据文件备份是在append nologging数据写入之前，如果是之后rman备份会直接备份数据blocks;
满足上述四种情况下，使用rman restore恢复数据文件之后,recover database将无法恢复nologging写入的数据，从而导致数据库恢复不完整！
本例中只是测试，如果是主库真正遇到这种数据丢失需要进行数据库恢复，那么只能舍弃这些异常的表数据！从而造成RMAN有备份，但是数据还是无法恢复的场景！！！那么你的DG如何百分百恢复主库数据呢？因此Oracle搭建DG第一个要求就是开启DB级别 force logging

2.1.3 数据库Nologging

恩墨大讲堂后面也有大佬专门详细讲过nologging的问题，可以直接使用其总结
周玉琪 https://www.modb.pro/video/1155
1.数据开启force loggin，那么所有nologging方式不在生效；
V$DATABASE视图,FORCE_LOGGING,默认NO
2.如果表空间开启force logging,那么对象的nologging将被忽略;
dba_tablespaces视图FORCE_LOGGING，默认NO
3.表默认logging属性，使用hint nologging执行dml都会记录日志,nologging方式无效;
DBA_TABLES视图，LOGGING默认为YES
4.表属性为nologging，使用insert append,create table,alter table xx move,SQL*Loader、data dump将可以调用nologging
5.如果是dg出现这种nologging带来的问题，那么重建DG即可！
6.如果是主库出现这个问题，如果是索引，那么重建也行，但是如果是表，那么只能忍痛丢失数据！使用exp尽可能的抽取正常的block数据记录~
7.如何判断会出现这种场景,检查rman备份的数据文件scn 是否< v$datafile视图中UNRECOVERABLE_CHANGE#，如果小于，说明有在备份文件之后，DB产生nologging的相关操作，这个操作被归档日志记录，但是RMAN备份的文件blocks不包含~
8.建议搭建DG肯定开启force logging规避问题！

2.2OGG

2.2.1 OGG报错

每天混日子挺开心的，突然有一天客户问了一个问题？
1、OGG-01973 The redo record indicates data loss on object
这个抽取进程报错原因是什么  （ogg12.2版本的）
2、如果加上这个参数tranlogoptions _allowdataloss会丢数据吗

What???这是啥？啥啥啥？？？套路-MOS-必应-百度

2.2.2 MOS资料库检索问题相关的文章

1）OGG升级版本并且使用压缩功能后,存在报错，解决办法排除过滤临时表！不符合！

OGG-01973 Oracle GoldenGate Capture For Oracle -
 The Redo Record Indicates Data Loss On Object <X > (Doc ID 1629057.1)	
After GoldenGate upgrade from 11.1.1.3 to 11.2.1.0.14,  errors occur in the log 
file and extract process 
2014-01-06 22:18:45  
ERROR   OGG-01973  Oracle GoldenGate Capture for Oracle, <extract_name>: 
 The redo record indicates data loss on object 276971
CAUSE
Compression Advisor (delivered with DBMS_COMPRESSION in 11gR2) is used.
DBMS_TABCOMP_TEMP_UNCMP and DBMS_TABCOMBP_TEMP_CMP
 are tables which are created and dropped dynamically by Oracle.
If you check the object ID reported in the error message, 
the object ID is already gone.
So we can see in the extract parameter file :
TABLEEXCLUDE *.DBMS_TABCOMP_TEMP_CMP
but not
TABLEEXCLUDE *.DBMS_TABCOMP_TEMP_UNCMP

2）表数据使用Nologgin的情况导致OGG无法从日志中解析数据！符合

OGG v11.2 Extract abends with 
"OGG-01965 The redo record indicates data loss on object <object id>"
 or "OGG-01973 The redo record indicates data loss on object <object id>"
 (Doc ID 1483431.1)
Oracle GoldenGate - Version 11.2.1.0.1 and later
Information in this document applies to any platform.
SYMPTOMS
2012-08-02 09:10:09  ERROR   OGG-01965  The redo record indicates data
 loss on object xxx,xxx.
or
2012-18-02 05:14:03  ERROR   
OGG-01973  The redo record indicates data loss on object xxx,xxx
CAUSE
Redo log records indicate data missing for the object in question
 and thus the error

SOLUTION
Prior to the code change in bug 14158453 Extract would abend with errors
 similar to following as described in KM note 1465720.1 for
tables/partitions with NOLOGGING regardless of whether the 
tablespace/ database is in logging/force logging mode.

ERROR OGG-01960 Failed to validate table <OWNER>.<TABLENAME>. 
The table is created with the NOLOGGING option, which is not supported. 
Extract may not be able to capture data from it.

And DBOPTIONS ALLOWNOLOGGING can be added in the extract to work-around the
 error with known limitations as described in 1465720.1

However post the code change in bug 14158453 the parameter
 DBOPTIONS ALLOWNOLOGGING is no longer valid
Extract no longer relies on the table metadata (logging/ nologging) and
 instead uses special records in the redo log to detect
missing data and when detected it would error as below indicating the object
 id of the table in question (same type of error is signaled when table data
 has been loaded via SQL*Loader with UNRECOVERABLE parameter):

2012-08-02 09:10:09  ERROR   
OGG-01965  The redo record indicates data loss on object xxx,xxx.

The table can be excluded from the extract, enable logging at 
the table/ partition/ tablespace level (or if table data is loaded via SQL*Loader
 do not use UNRECOVERABLE) and include it back in the extract
if it needs to be replicated across

If using DDL extraction following can be run to see the offending object name
SELECT metadata_text, fragmentNo, optime FROM <owner>.GGS_DDL_HIST 
WHERE seqno = (SELECT MIN(seqno) FROM <owner>.GGS_DDL_HIST 
WHERE objectid = (SELECT objectId FROM <owner>.GGS_DDL_HIST_ALT  
WHERE altObjectId = <object id reported> AND rownum=1));
 eg.,
SELECT metadata_text, fragmentNo, optime FROM ggs_owner.GGS_DDL_HIST 
WHERE seqno = (SELECT MIN(seqno) FROM ggs_owner.GGS_DDL_HIST 
WHERE objectid = (SELECT objectId FROM ggs_owner.GGS_DDL_HIST_ALT  
WHERE altObjectId = xxxxxx  AND rownum=1));
Also following note can  be referred to prevent these nologging operations
The Gains and Pains of Nologging Operations (Doc ID 290161.1) 
The error can also be seen for Compression Advisor related tables
 (DBMS_TABCOMP* in 11.2.0.3 & CMP<x>$<object_id> in 11.2.0.4) and
 can be safely excluded from replication

那么MOS相关的文章找到之后，那么能给我们什么帮助呢？
基本上可以认为需要重新对这个表重新初始化操作！跳过不去处理它，两种方法？
那么客户提出的参数有什么作用？

3）重做日志损坏的情况下ogg version>=12.2,抽取进程可以使用参数_allowdataloss

Classic Extract Fails With 
Error "The redo record indicates data loss on object nnnnn" (Doc ID 2316712.1)	
Oracle GoldenGate - Version 12.2.0.1.0 and later
SYMPTOMS
 Classic extract on 12.2 fails with ERROR OGG-01973 ...
The redo record indicates data loss on object nnnnn
Note that, Table with same object id exists on database
SQL> select object_name,owner, object_type from dba_objects where object_id=nnnn;
AUSE
 Redo corruption, Activity logs may show messages like
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 5875 oracle/redoorardr.c | DETECT ZERO LENGTH RECORD1 at 93185656!
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 6867 oracle/redoorardr.c | Completed pdata buffering 
in sequence 1288, with nb = 1, tb = 1, lb = 182003.
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 262 oracle/redoorapl.c | PL_get: len=104, vld=4, subscn=1, SCN = 2501.2438258045 (10744151465341), seqno=1288, rba=93185552
2017-10-11 13:56:54.136 ERROR|er.redo.ora.thread |Redo_Thread_2| 6993 oracle/redoorardr.c | Detected an SCN base increase greater than 1, last_record_scn = 2501.2438257951 (10744151465247) current_rechdr.scn = 2501.2438258045 (10744151465341)
2017-10-11 13:56:54.136 INFO |gglog.std.application |Redo_Thread_4| 69 ggapp/ggPrintLn.c | [rpt]:
2017-10-11 13:56:54 ERROR OGG-01973 The redo record indicates data loss on object 437,723.
SOLUTION
 Tranlogoptions allowdataloss will work for 12.2 OGG versions. This is made internal parameter on 12.2 onwards
If customer is on OGG 12.2, use _allowdataloss and restart the extract.
Tranlogoptions _allowdataloss

参数也检索到了，但是不知道这个参数有什么影响？

2.2.3实验测试

1）DB no force logging

> select FORCE_LOGGING from v$database;
FOR
---
NO

2)测试表使用nologgin

--source
SQL> alter table test_b nologging;

3）正常写入数据

--Source
SQL>insert into test_b values(4,'yz');
commit;
--target
SQL> select count(*) from test_b where id=4;
  COUNT(*)
----------
         1
OGG链路正常
源端目标端都执行truncate table test_b;清理数据

4）append 写入数据

--source
insert /*+ append nologging */ into test_b select * from test nologging;
commit;

5）OGG抽取进程报错

EXTRACT     ABENDED     EXTLOCAL    00:00:00      00:00:05    
OGG-06509  Using the following key columns for source table TEST.TEST_B: ID, NAME.
OGG-06508  Wildcard MAP (TABLE) resolved (entry test.*): TABLE "TEST"."TEST_B".
WARNING OGG-06439  No unique key is defined for table TEST_B. All
 viable columns will be used to represent the key,
 but may not guarantee uniqueness. KEYCOLS may be 
used to define the key.
OGG-06509  Using the following key columns for source table TEST.TEST_B: ID, NAME.
ERROR   OGG-01973  The redo record indicates data loss on object 87,486.

6）使用参数Tranlogoptions _allowdataloss

解决方式：
GGSCI (ce1) 1> edit params EXTLOCAL
EXTRACT EXTLOCAL
USERID ogg, PASSWORD ogg
EXTTRAIL /u01/ogg/dirdat/lt
--ddl include all
TABLE test.*;
Tranlogoptions _allowdataloss  --加上这行参数

7）观察append数据同步

--source
SQL>alter database force logging;
SQL>alter table test_b logging;
SQL>commit;
SQL>select * from test_b;
        ID NAME
---------- ----------------------------------------
         1 dzl
         2 lzq
         3 mb

--target
SQL> select * from test_b;
no rows selected
可以发现使用这个参数只是让OGG抽取进程不在报错，但是实际上append写入的3条记录，
这里还有一个小点，在commit之前开启force logging database,table logging，
但是实际上这两个操作没啥子效果，因为insert append 到commit操作大于3s,
相关日志log buffer 写入current online redo file.
 OGG还是无法从redo获取append insert 3条数据的变换！
并没有同步到目标库中！ 造成数据丢失，再次truncate删除数据，保留参数

8）观察新增数据是否有效

--source
SQL>insert into test_b values(4,'lll');
SQL>commit;
--target 
SQL>select * from test_b;
        ID NAME
---------- --------------------
         4 lll
后续新增的数据可以正常同步！

测试一、Source 11g Target 19c pdb;OGG Version 19.1,抽取经典模式、复制集成模式；
测试二、Source 11g Target 11g ;OGG Version 19.1,抽取集成模式、复制集成模式，发生另外一个比较有意思的现象！就是抽取进程nologging属性source写入的数据，target 找不到数据，抽取进程OGG也不报错！
说明OGG集成模式下，Logminer analyze log buffer 分析得到的LCR并没有得到信息，但是Ogg内部也没有展示相关的报错！！！

小结：通过学习MOS文章以及实验测试，我们可以知道没有开启db force logging的情况下，nologging写入的数据导致OGG经典模式抽取进程报错，无法正常解析数据，如果使用参数那么是跳过！只是保证进程正常运行！如果是集成模式，那么可能造成数据不一致，抽取进程直接没有捕获nologging的数据，而且隐藏的挺深的，因为没有任何表象的报错信息！

三、nologging

为了避免说文章灌水，本篇MOS截取相应想要的信息进行解释说明。

The Gains and Pains of Nologging Operations (Doc ID 290161.1)	

Whereas a logged INSERT operation has to generate redo for every change data or undo block, nologging operations indicate that the database operation is not logged in the online redo log file. Even though a small invalidation redo record¹ is still written to the online redo log file, nologging operations skip the redo generation of the corresponding inserted data.  Nologging can be extremely beneficial for the following reasons:
data written to the redo is minimized dramatically
time to insert into a large table or index or LOB can be reduced dramatically
performance improves for parallel creation of large tables or indices
However, NOLOGGING is intended for configurations in which media recovery or the recovery of the corresponding object is not important. Thus, if the disk or tape or storage media fails, you will not be able to recover your changes from the redo because the changes were never logged.  
Oracle nologging会减少日志的写入，由于日志写入的减少，从而加快了表数据的写入数据以及索引等大的对象创建的时间

For databases in ARCHIVELOG mode, nologging operations can only occur
 for a particular object if and only if:
Database allows for nologging (ALTER DATABASE NO FORCE LOGGING) and
Tablespace allows for nologging (ALTER TABLESPACE <NAME> NO FORCE LOGGING) and
Object allows for nologging (ALTER TABLE <NAME> NOLOGGING)

Examples of nologging operations
Below is a list of examples that can be used for testing purposes.   
The database must be in ARCHIVELOG mode and the segment must explicitly be
 set to NOLOGGING and must allow nologging operations to see the effect
 of nologging changes:
    1. insert /*+ APPEND */ into scott.emp select * from sys.emp2;
    2. create table emp nologging as select * from sys.emp;
    3. create index emp_i on emp(empno) nologging;
    4. sqlload operation with unrecoverable option

Detection of Nologging Operations On the Primary and Standby Databases
On the primary database, you can monitor for the most recent nologging 
operation that occurred in the database by issuing the following query:
         SELECT NAME, UNRECOVERABLE_CHANGE#,                
         TO_CHAR (UNRECOVERABLE_TIME,'DD-MON-YYYY HH:MI:SS') 
         FROM V$DATAFILE;
The above primary database's query dictates when the most recent nologging
 operation occurred and when the invalidation redo was written to the redo.    

Once Redo Apply (or Media Recovery) processes the invalidation redo,
 it marks all the corresponding data blocks corrupt. 
 You will detect encounter corrupted blocks on the physical standby
 database when you query any data that references these data blocks. 
  You will receive the following errors:  
                ORA-01578: ORACLE data block corrupted (file # 3, block # 514)
                ORA-01110: data file 3: '/path/users.dbf'
                ORA-26040: Data block was loaded using the NOLOGGING option
You can proactively catch some of these corrupted blocks on 
Redo Apply (or media recovery) instance by running DBVERIFY on the data files. 
            
        $ dbv file=users.dbf
        DBVERIFY - Verification starting : FILE = users.dbf
        DBV-00200: Block, dba 12583426, already marked corrupted
        DBV-00200: Block, dba 12583427, already marked corrupted
        DBV-00200: Block, dba 12583428, already marked corrupted

For a physical standby database, follow these steps² to reinstantiate the relevant
 data files .
    1. stop Redo Apply (recover managed standby database cancel)
    2. offline corresponding datafile(s)
       (alter database datafile <NAME> offline drop;)
    3. start Redo Apply (recover managed standby database disconnect)
    4. copy the appropriate backup datafiles over from the primary database
       (e.g. use RMAN to backup datafiles and copy them)
    5. stop Redo Apply (recover managed standby database cancel)
    6. online corresponding data files (alter database datafile <NAME> online;)
    7. start Redo Apply (recover managed standby database disconnect)

感谢大家看了较为冗长文章，希望有所收获~,欢迎留言吐槽

墨力计划

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者