首先先介绍下ogg的工作同步流程如下:
1.源端抽取进程从源库redo log抽取变化的数据(已提交的事务),写入到本地trail文件中。
2.成功将已提交事务写入trail文件后,抽取进程会更新检查点。
2.源端传输进程将trail文件通过tcp/ip发送到配置参数指定的目标数据库。
3.目标端server collector进程对投递过来的数据块重新组装为trail文件。
3.目标端复制进程读取操作系统目录下的trail文件,形成sql语句,在目标数据库中执行。还有参数解读
lag at chkpt :进程当前正在读取的redo日志记录中的时间戳与当前时间间隔。
time since chkpt 表示进程从最近一次检查点commit提交数据后,到当前时间未提交数据的时长
checkpoint:存储进程的读取数据库日志的位置rba及trail文件中写入位置信息,每当进程看到事务commit时,检查点文件会产生一个检查点。1.问题现象
8月17早上开始出现抽取进程一直刷ora-01027警告,持续几个小时,内容都是如Long Running Transaction:长事务XID 5226.4.27923,同一个长事务。
另外还有ora-01028不支持的压缩表导致导致进程自动关闭然后重启,这个应该是另外一个问题导致了进程的重启操作。共计是两个问题。
截取进程的部分日志如下:
2025-08-17 11:22:26 INFO OGG-01487 Oracle GoldenGate Capture for Oracle, xx.prm: DDL found, operation [create table "xx".CMP4$626139 organization heap tablespace "xx_DATA" compress for all operations nologging as select /*+ DYNAMIC_SAMPLING(0) */ * from "xx".CMP3$626139 mytab (size 188)], start SCN [17501796651397], commit SCN [17501796651658] instance [xx1 (1)], DDL seqno [4101308], marker seqno [4118924].
2025-08-17 11:22:26 INFO OGG-00488 Oracle GoldenGate Capture for Oracle, xx.prm: DDL operation excluded [not included by any filter], optype [CREATE], objtype [TABLE], objowner [xx], objname [CMP4$626139].
2025-08-17 11:22:26 INFO OGG-01487 Oracle GoldenGate Capture for Oracle, xx.prm: DDL found, operation [drop table "xx".CMP3$626139 purge (size 38)], start SCN [17501796651720], commit SCN [17501796651812] instance [xx1 (1)], DDL seqno [4101311], marker seqno [4118927].
2025-08-17 11:22:26 INFO OGG-00487 Oracle GoldenGate Capture for Oracle, xx.prm: DDL operation included [include mapped], optype [DROP], objtype [TABLE], objowner [xx], objname [CMP3$626139].
2025-08-17 11:22:27 INFO OGG-00497 Oracle GoldenGate Capture for Oracle, xx.prm: Writing DDL operation to extract trail file.
2025-08-17 11:22:27 INFO OGG-01487 Oracle GoldenGate Capture for Oracle, xx.prm: DDL found, operation [drop table "xx".CMP4$626139 purge (size 38)], start SCN [17501796651824], commit SCN [17501796652051] instance [xx1 (1)], DDL seqno [4101312], marker seqno [4118928].
2025-08-17 11:22:27 INFO OGG-00488 Oracle GoldenGate Capture for Oracle, xx.prm: DDL operation excluded [not included by any filter], optype [DROP], objtype [TABLE], objowner [xx], objname [CMP4$626139].
2025-08-17 11:23:08 WARNING OGG-01027 Oracle GoldenGate Capture for Oracle, xx.prm: Long Running Transaction: XID 5226.4.27923, Items 1, Extract xx, Redo Thread 1, SCN 4074.3508022147 (17501204786051), Redo Seq #88435, Redo RBA 371576848.
2025-08-17 11:24:15 ERROR OGG-01028 Oracle GoldenGate Capture for Oracle, xx.prm: Object with object number 1069184 is compressed. Table compression is not supported.
2025-08-17 11:24:16 INFO OGG-00991 Oracle GoldenGate Capture for Oracle, xx.prm: EXTRACT xx stopped normally.
2025-08-17 11:27:20 INFO OGG-00975 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT xx starting.
2025-08-17 11:27:20 INFO OGG-00965 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT xx restarted automatically.
2025-08-17 11:27:21 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, xx.prm: EXTRACT xx starting.
2025-08-17 11:27:21 INFO OGG-03035 Oracle GoldenGate Capture for Oracle, xx.prm: Operating system character set identified as UTF-8. Locale: en_US, LC_ALL:.
2025-08-17 11:27:21 INFO OGG-01815 Oracle GoldenGate Capture for Oracle, xx.prm: Virtual Memory Facilities for: BR
/backup/ogg/BR/xx.2.抽取进程信息
lag at chkpt 没有延迟,说明抽取进程读取功能是正常的,表示抽取进程还在一直读取redo日志。
time since chkpt 抽取进程巡检发现近两天的延迟,表示ogg无法推进检查点,很大程度说明有长事务执行时间较长。

读取检查点一直停留在11点

3.长事务阻塞分析
ogg必须等待事务提交才能处理后续数据,导致后续数据挤压,延迟持续增长;因为无法处理数据,所以无法写入Trail文件;因为无法写入Trail文件,所以检查点位置就永远停滞
1)找到日志里5226对应的addr
select t.addr,t.XIDUSN,t.XIDSLOT,t.XIDSQN,t.START_DATE from gv$transaction t;

2)通过taddr找到对应的prev_sql_id
select t.PREV_SQL_ID,t.SQL_ID from gv$session t where t.taddr='xx';

3)根据sqlid找到对应的具体sql操作
select sql_text from gv$sqltext t where t.SQL_ID = 'xx';

4)找到对应的sid和serial#查杀会话,再次查询事务已消失
select t.sid,t.serial#,t.PREV_SQL_ID,t.SQL_ID from gv$session t where t.taddr='xx';
ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;

4.重启抽取进程报错
长事务杀掉后,事务回滚可能也需要一段时间,延迟目前还在增加!
现在不管是stop,还是发起send等命令都是卡死超时状态

强制停止进程后启动正常,延迟时间减少
kill extract xx;
start extract xx;
5.检查抽取进程状态
send xx,status

过一会再次执行,发现sequence和scn已经变化

6.不支持的压缩表处理
通过对象号查询
SELECT owner, object_name, object_type ,a.OBJECT_ID FROM dba_objects a WHERE object_id = 1069184;
查询数据库所有的压缩表
select a.TABLE_NAME,a.OWNER,a.COMPRESSION,a.COMPRESS_FOR from dba_tables a;
抽取进程中排除压缩表
TABLEEXCLUDE owner.table_name;
附1.长交易事务查询与跳过
send extract xxx , showtrans thread 1 count 10 应该对上transaction查询的内容
SEND EXTRACT xxx, SKIPTRANS 5226.4.27923 THREAD 1 //跳过交易
SEND EXTRACT xxx, FORCETRANS 5226.4.27923 THREAD 1 //强制认为该交易已经提交



