暂无图片
暂无图片
7
暂无图片
暂无图片
暂无图片

GoldenGate高手秘籍:启用GoldenGate TRACE调试,疑难杂症无所遁形!

原创 szrsu 2025-01-20
1059

Goldengate 跟踪介绍

在 Oracle GoldenGate 中,跟踪功能对于诊断复杂问题非常有用。通过跟踪,您可以了解内部处理的详细信息,并快速定位问题根源。如果我们想跟踪指定的EXTRACT/REPLICAT,要怎么做呢?

Oracle GoldenGate 提供了TRACE/TRACE2命令,允许您记录进程的详细操作信息,从而帮助你跟踪定位问题。

TRACE 的级别决定了记录信息的详细程度:

TRACE:记录基本的事件和操作。

TRACE2:记录比TRACE更详细的信息,包括性能指标和内部数据。

启用Goldengate TRACE方法

Goldengate启用TRACE有两种方式,一种是通过修改配置文件,但需要重启进程才能生效;另外一种是可以在线启用,无需重启进程。下面分别介绍一下这两种启用方式:

启用TRACE方式1

在 Oracle GoldenGate 的参数文件中,添加以下内容来启用TRACE,并重启进程生效:

TRACE | TRACE2 [, DDL[INCLUDE] | DDLONLY] [, [FILE] file_name] [, THREADS (threadID[, threadID][, ...][, thread_range[, thread_range][, ...])]

如:

GGSCI (ora11g) 2> edit param extcs ........ TRACE ./dirrpt/extcs_trace1.trc TRACE2 ./dirrpt/extcs_trace2.trc GGSCI (ora11g) 3> stop extcs Sending STOP request to EXTRACT EXTCS ... Request processed. GGSCI (ora11g) 4> start extcs Sending START request to MANAGER ... EXTRACT EXTCS starting

启用TRACE方式2

使用send命令,发现命令给相应的进程,在线打开trace,方法如下:

send XXXXX trace ./dirrpt/XXXXX.trc
或
send XXXXX trace2 ./dirrpt/XXXXX2.trc

如:

GGSCI (ora11g) 11> send extcs trace ./dirrpt/extcs_trace1.trc

Sending TRACE request to EXTRACT EXTCS ...
Tracing command operation successfully processed. Trace file /ogg/dirrpt/extcs_trace1.trc opened.

或

GGSCI (ora11g) 12> send extcs trace2 ./dirrpt/extcs_trace2.trc

Sending TRACE2 request to EXTRACT EXTCS ...
Tracing command operation successfully processed. Trace file /ogg/dirrpt/extcs_trace2.trc opened.

查看trace文件

进入设置的生成trace文件目录,查看生成的trace

$ cd dirrpt
$ ls *trace*
extcs_trace1.trc  extcs_trace2.trc
$ tail -f extcs_trace1.trc 
                       0.00% 	Positioning into redo log file
                      79.25% 	Reading record from redo log file
                       0.00% 	Extracting subrecord from redo record
                       0.01% 	Retrieving and processing transaction items
                      
                      Time Delta statistics:
11:24:20.192 (306114) entering checkpointPosition
11:24:20.193 (306115) exiting checkpointPosition
11:24:21.198 (307120) exited DataSource::readLCR(stat=400,io_type=5)
11:24:22.199 (308121) * --- entering DataSource::readLCR() --- *

$ tail -f extcs_trace2.trc 
11:24:22.200 (308122) Checking messages (includes checkpointing) 0.010% (execute=0.033,total=308.114,count=860)
11:24:23.207 (309129) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:24.217 (310139) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:25.224 (311146) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:26.231 (312153) Reading record from redo log file 79.472% (execute=248.071,total=312.146,count=940)
11:24:26.231 (312153) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:27.239 (313160) exited DataSource::readLCR(stat=400,io_type=5)
11:24:28.239 (314161) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:28.239 (314161) Checking messages (includes checkpointing) 0.010% (execute=0.033,total=314.154,count=870)
11:24:28.239 (314161) * --- entering DataSource::readLCR() --- *
11:24:29.248 (315170) Checking periodic tasks 0.000% (execute=0.002,total=315.163,count=250)
11:24:29.248 (315170) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:30.256 (316178) entering checkpointPosition
11:24:30.256 (316178) exiting checkpointPosition
11:24:30.256 (316178) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:30.256 (316178) Reading record from redo log file 79.417% (execute=251.096,total=316.171,count=950)
11:24:31.262 (317184) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7
11:24:31.262 (317184) Checking messages (includes checkpointing) 0.010% (execute=0.033,total=317.177,count=880)
11:24:32.273 (318195) entering IPC_read_TCP with hndl->ip.sock -1 and listen_sock 7

注意事项:

  • TRACE 和 TRACE2 可能会生成非常大的日志文件,因此仅在必要时启用,使用完成后,记得关闭它。(参考后面的关闭方法)

  • 为了安全起见,确保日志文件存储在受保护的路径中,避免泄露敏感数据。

停止跟踪

完成问题诊断后,请务必关闭 TRACE 以降低性能开销,停止跟踪方法如下:

(1)如果你是使用修改配置参数启用的跟踪,需要将相关配置进行注释或者删除,然后重启进程

GGSCI (ora11g) 7> edit param extcs
--TRACE ./dirrpt/extcs_trace1.trc
--TRACE2 ./dirrpt/extcs_trace2.trc


GGSCI (ora11g) 8> stop extcs

Sending STOP request to EXTRACT EXTCS ...
Request processed.


GGSCI (ora11g) 9> start extcs

Sending START request to MANAGER ...
EXTRACT EXTCS starting

(2)如果你使用send命令,则可以在线关闭。

GGSCI (ora11g) 1> send extcs trace off;

Sending TRACE request to EXTRACT EXTCS ...
Tracing command operation successfully processed. Closing all trace files.

或

GGSCI (ora11g) 2> send extcs trace2 off;

Sending TRACE2 request to EXTRACT EXTCS ...
Tracing command operation successfully processed. All trace files are closed.

分析TRACE文件

生成的跟踪文件通常以纯文本格式保存,您可以使用任何文本编辑器打开进行分析。

TRACE 文件的主要内容包括:

  • 时间戳:记录事件的时间。
  • 处理信息:包括数据操作、事务状态等。
  • SUMMARY STATISTICS:这部分比较重要,用于展示数据处理过程的时间消耗百分比。这些统计数据对于监控和优化 OGG 的运行非常重要,帮助你识别系统瓶颈和潜在问题。

如:

11:52:07.239 (1589236) * --- entering DataSource::readLCR() --- * 11:52:12.279 (1594276) exited DataSource::readLCR(stat=400,io_type=5) 11:52:13.280 (1595277) entering checkpointPosition 11:52:13.280 (1595277) exiting checkpointPosition 11:52:13.280 (1595277) * --- entering DataSource::readLCR() --- * 11:52:18.320 (1600317) exited DataSource::readLCR(stat=400,io_type=5) 11:52:19.322 (1601319) * --- entering DataSource::readLCR() --- * 11:52:23.353 (1605350) entering checkpointPosition 11:52:23.353 (1605350) exiting checkpointPosition 11:52:24.359 (1606356) exited DataSource::readLCR(stat=400,io_type=5) 11:52:25.361 (1607358) * --- entering DataSource::readLCR() --- * 11:52:30.398 (1612395) exited DataSource::readLCR(stat=400,io_type=5) 11:52:31.400 (1613397) * --- entering DataSource::readLCR() --- * 11:52:31.401 (1613398) exited DataSource::readLCR(stat=0,io_type=5) 11:52:31.401 (1613398) processing record for SZR.TEST_JOB 11:52:31.401 (1613398) Writing seqno [46], rba [5675], table [SZR.TEST_JOB] 11:52:31.401 (1613398) * --- entering DataSource::readLCR() --- * 11:52:33.417 (1615414) entering checkpointPosition 11:52:33.417 (1615414) exiting checkpointPosition 11:52:36.440 (1618437) exited DataSource::readLCR(stat=400,io_type=5) .............. SUMMARY STATISTICS General statistics: 0.01% Checking messages (includes checkpointing) 0.00% Checking periodic tasks 0.00% Waiting for more data 0.00% Converting ASCII data to internal 97.23% Reading input records 0.00% Writing output records (replicate_io) 0.00% Mapping columns 0.00% Outputting data records 0.00% Performing SQL statements 0.00% Executing BATCHSQL statements 0.00% Executing SQL statements 0.00% Preparing SQL statements 0.00% Commit operations 0.00% Rollback operations 0.00% Checkpointing Redo log statistics: 0.00% Opening redo log file 0.00% Positioning into redo log file 97.23% Reading record from redo log file 0.00% Extracting subrecord from redo record 0.00% Retrieving and processing transaction items Time Delta statistics:

上面的显示97.23%时间消耗在Reading input records 读取输入记录(从数据库的日志中读取变更数据消耗的时间,如果此时发现ogg进程存在延时,排除了其他问题(大事务、参数设置不合理、CPU或内存使用过高等),则需要检查IO,是否存储压力过高、系统的IO负载过大等。

以下是一些使用 TRACE分析的典型场景:

  • 数据同步延迟问题:使用 TRACE 检查每个过程的耗时统计。
  • 报错分析:使用TRACE,找出报错原因。
  • 其他疑难问题:如进程突然abend,但没有报错内容,查看ggserr.log也没有想过报错信息。

总结

TRACE 和 TRACE2 是 Oracle GoldenGate 中强大的调试工具,通过合理使用这些功能,您可以高效地诊断和解决复杂问题,包括数据同步延迟、性能瓶颈、数据冲突和报错等。查看生成的trace信息,您可以了解 OGG当前 的工作状态,快速排查问题。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论