暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

DDL 超时阻塞 OMS 同步链路案例与解决

29

作者:任仲禹,爱可生数据库工程师,擅长故障分析和性能优化。

爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

本文约 1900 字,预计阅读需要 8 分钟。



故障背景

某客户生产环境 OceanBase(Oracle 模式)数据库 -> Oracle 增量同步链路(DDL + DML)延迟过大,Incr-Sync 组件 位点无法推进,位点卡在 2025-04-29 12:08:25
 左右,增加了业务对于同步数据读取延迟风险。

  • OMS 迁移工具页面中的故障现象如下(延迟时间持续增大)。

分析过程

检查组件 connector.log 日志

登录 Incr-Sync 组件[1] 所在服务器容器内,发现日志中在反复执行一条创建索引 DDL 并且报错 java.sql.SQLRecoverableException: IO Error: Socket read timed out

[2025-04-29 14:48:01.395] [WARN] [sinkTask-9] [Need reset connection oms record [meta[prd_dbarenzy.GDBARENZYU-BFSDRENZYRENZYRENZYRENZY,1745899708,DDL,1745899708]post[Struct{ddl(VAR_CHAR_STRING)=CREATE INDEX "IDX_JY
MX_UNIT_STATUS"
 ON "BFSDRENZYRENZYRENZYRENZY"("ACCOUNRENZY","TRANSSRENZY"),env(VAR_CHAR_STRING)=1745899708724080,ddlType(VAR_STRING)=CREATE_INDEX,srcDatabase(VAR_STRING)=prd_dbarenzy.GDBARENZYU,srcTable(VAR_STRING)
=BFSDRENZYRENZYRENZYRENZY,destDatabase(VAR_STRING)=GDBARENZYU,destTable(VAR_STRING)=BFSDRENZYRENZYRENZYRENZY,originalDDL(VAR_STRING)=CREATE INDEX IDX_JYMX_UNIT_STATUS on BFSDRENZYRENZYRENZYRENZY(ACCOUNRENZY,TRANSSTA
TUS);,affectObjects(LIST)={"db":"GDBARENZYU","table":"BFSDRENZYRENZYRENZYRENZY"}}]], exception message [java.sql.SQLRecoverableException: IO Error: Socket read timed out], errorCode [17002], sqlState [08006]]
[2025-04-29 14:48:01.396] [WARN] [sinkTask-9] [retry for exception]
java.sql.SQLRecoverableException: IO Error: Socket read timed out
        at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:993)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1205)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3666)
        at oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1426)
        at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3756)
        at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3736)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1063)
        at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_executeUpdate(FilterChainImpl.java:3253)
        at com.alibaba.druid.filter.FilterAdapter.preparedStatement_executeUpdate(FilterAdapter.java:1092)
        at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_executeUpdate(FilterEventAdapter.java:491)
        at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_executeUpdate(FilterChainImpl.java:3251)
        at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.executeUpdate(PreparedStatementProxyImpl.java:194)
        at com.alibaba.druid.pool.DruidPooledPreparedStatement.executeUpdate(DruidPooledPreparedStatement.java:255)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.writeDDL(Writer.java:641)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.executeInRetry(Writer.java:480)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushDDL(Writer.java:435)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushRecords(Writer.java:182)
        at com.oceanbase.oms.connector.jdbc.sink.DefaultJDBCSink.offer(DefaultJDBCSink.java:60)
        at com.oceanbase.connector.framework.threadmanager.sinktask.SyncSinkConnectorTask.run(SyncSinkConnectorTask.java:66)
        at java.lang.Thread.run(Thread.java:750)
Caused by: oracle.net.nt.TimeoutInterruptHandler$IOReadTimeoutException: Socket read timed out
        at oracle.net.nt.TimeoutSocketChannel.handleInterrupt(TimeoutSocketChannel.java:254)
        at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:180)
        at oracle.net.ns.NSProtocolNIO.doSocketRead(NSProtocolNIO.java:555)
        at oracle.net.ns.NIOPacket.readHeader(NIOPacket.java:258)
        at oracle.net.ns.NIOPacket.readPacketFromSocketChannel(NIOPacket.java:190)
        at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:132)
        at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:105)
        at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:91)
        at oracle.net.ano.CryptoNIONSDataChannel.readDataFromSocketChannel(Unknown Source)
        at oracle.jdbc.driver.T4CMAREngineNIO.prepareForUnmarshall(T4CMAREngineNIO.java:784)
        at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:449)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:410)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:269)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:655)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:270)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:91)
        at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:970)
        ... 19 common frames omitted
[2025-04-29 14:48:01.396] [WARN] [sinkTask-9] [rollback txn meet exception [{}], ignore it]
java.sql.SQLRecoverableException: Closed Connection
        at oracle.jdbc.driver.PhysicalConnection.getAutoCommit(PhysicalConnection.java:2089)
        at oracle.jdbc.driver.PhysicalConnection.rollback(PhysicalConnection.java:2218)
        at com.alibaba.druid.filter.FilterChainImpl.connection_rollback(FilterChainImpl.java:714)
        at com.alibaba.druid.filter.FilterAdapter.connection_rollback(FilterAdapter.java:974)
        at com.alibaba.druid.filter.FilterChainImpl.connection_rollback(FilterChainImpl.java:709)
        at com.alibaba.druid.proxy.jdbc.ConnectionProxyImpl.rollback(ConnectionProxyImpl.java:399)
        at com.alibaba.druid.pool.DruidPooledConnection.rollback(DruidPooledConnection.java:814)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.rollbackConnection(Writer.java:698)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.handleRetry(Writer.java:708)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.executeInRetry(Writer.java:509)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushDDL(Writer.java:435)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushRecords(Writer.java:182)
        at com.oceanbase.oms.connector.jdbc.sink.DefaultJDBCSink.offer(DefaultJDBCSink.java:60)
        at com.oceanbase.connector.framework.threadmanager.sinktask.SyncSinkConnectorTask.run(SyncSinkConnectorTask.java:66)
        at java.lang.Thread.run(Thread.java:750)
[2025-04-29 14:48:01.396] [ERROR] [sinkTask-9] [recyle error]
java.sql.SQLRecoverableException: Closed Connection
        at oracle.jdbc.driver.PhysicalConnection.setAutoCommit(PhysicalConnection.java:2067)
        at com.alibaba.druid.filter.FilterChainImpl.connection_setAutoCommit(FilterChainImpl.java:738)
        at com.alibaba.druid.filter.FilterAdapter.connection_setAutoCommit(FilterAdapter.java:986)
        at com.alibaba.druid.filter.FilterChainImpl.connection_setAutoCommit(FilterChainImpl.java:733)
        at com.alibaba.druid.proxy.jdbc.ConnectionProxyImpl.setAutoCommit(ConnectionProxyImpl.java:429)
        at com.alibaba.druid.pool.DruidConnectionHolder.reset(DruidConnectionHolder.java:312)
        at com.alibaba.druid.pool.DruidDataSource.recycle(DruidDataSource.java:1949)
        at com.alibaba.druid.pool.DruidPooledConnection.recycle(DruidPooledConnection.java:351)
        at com.alibaba.druid.filter.FilterChainImpl.dataSource_recycle(FilterChainImpl.java:5049)
        at com.alibaba.druid.filter.FilterAdapter.dataSource_releaseConnection(FilterAdapter.java:2750)
        at com.alibaba.druid.filter.FilterChainImpl.dataSource_recycle(FilterChainImpl.java:5045)
        at com.alibaba.druid.pool.DruidPooledConnection.close(DruidPooledConnection.java:286)
        at com.oceanbase.oms.connector.jdbc.sink.AbstractDruidDataSourceBuilder.closeConnection(AbstractDruidDataSourceBuilder.java:47)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.closeConnection(Writer.java:752)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.executeInRetry(Writer.java:530)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushDDL(Writer.java:435)
        at com.oceanbase.oms.connector.jdbc.sink.Writer.flushRecords(Writer.java:182)
        at com.oceanbase.oms.connector.jdbc.sink.DefaultJDBCSink.offer(DefaultJDBCSink.java:60)
        at com.oceanbase.connector.framework.threadmanager.sinktask.SyncSinkConnectorTask.run(SyncSinkConnectorTask.java:66)
        at java.lang.Thread.run(Thread.java:750)

检查 Oracle 中数据量

表数据量 3400 万条左右,索引创建语句确实需要一定时间。

检查 connector_sink_msg.log

OMS 的 Incr-sync 组件 的消费端(sink)日志迟迟无新的 DML 记录被消费(一直卡在 12:08:30 秒左右),基本能确定其他 DML 被这条 DDL 阻塞了。

本例与 OceanBase 官网上的一篇文章[2] 描述相同,但结论(出现该问题,您需要耐心等待 DDL 执行结束)在本例中无效,因为该条 DDL 一直是反复执行报错。

参数 oracleSocketTimeout 的问题

因为 connector.log
 日志中报错是 java.sql.SQLRecoverableException: IO Error: Socket read timed out
,该报错按过往经验其实就是:OMS 的 Incr-sync 组件 与目标端 Oracle 数据库建立的 JDBC 连接因 SQL 执行超过 SocketTimeout 阈值而断链。

  • 大致推测出原因,我们就可以分析下 OMS 官网上 incr-sync 组件 与 SocketTimeout 参数相关内容[3],看是否有线索。
  • 确实存在参数 oracleSocketTimeout ,默认 50s 超时,本例中调大尝试能否解决问题。

修改方式

更新 Incr-sync 组件,在 sink 模块中,增加 oracleSocketTimeout 参数,并设置 1800s 超时。

检查参数生效

修改完成后,connector.log
 日志会打印 Incr-sync 和 Oracle 间建立的 JDBC 参数。

jdbc:oracle:thin:@//10.19.51.23:1521/zdbrenzy.actionskydba.com ?rewriteBatchedStatements=true&GetConnectionTimeout=20000&socketTimeout=1800000

105[2025-04-29 15:00:08.377] [INFO] [main] [[Init DruidDataSource]configInitSQLS: []]
106[2025-04-29 15:00:08.378] [INFO] [main] [connect with url [jdbc:oracle:thin:@//10.19.51.23:1521/zdbrenzy.actionskydba.com ?rewriteBatchedStatements=true&GetConnectionTimeout=20000&socketTimeout=1800000&connectTimeout=30000&allowMultiQueries=true&useLocalSessionState=true]]

检查 DDL 执行完成

等待一段时间,incr-sync 的 msg/ddl_msg.log
 日志显示该 DDL 成功被消费。

1[2025-04-29 15:00:17.560] [forward_slot0-(ETLProcessor)-queue_slot1] receive ddl:CREATE INDEX IDX_JYMX_UNIT_STATUS on BFSDRENZYRENZYRENZYRENZY(ACCOUNRENZY,TRANSSRENZY);, checkpoint[1745899707]
2[2025-04-29 15:00:19.041] [forward_slot0-(ETLProcessor)-queue_slot1] DDLTransfer: transfer ddl [CREATE INDEX IDX_JYMX_UNIT_STATUS on BFSDRENZYRENZYRENZYRENZY(ACCOUNRENZY,TRANSSRENZY);] to ddl [[CREATE INDEX "IDX_JYMX_UNIT_STATUS" ON "BFSDRENZYRENZYRENZYRENZY"("ACCOUNRENZY","TRANSSRENZY")]]
3[2025-04-29 15:01:48.953] [sinkTask-30] set sql:alter session set current_schema="GDBARENZYU", recordDb:GDBARENZYU

Oracle 中索引创建成功。

界面组件监控状态可正常推进,问题解决。

原因和结论

本例故障中 DDL 超时阻塞 OMS 的 Incr-Sync 增量同步组件位点推进的直接原因是 DDL 反复执行失败导致;

但根本原因是 oracleSocketTimeout 参数默认值太小(50s)、表的数据量过大,从而使 Incr-sync 组件与目标端 Oracle 数据库建立的 JDBC 连接因 DDL-SQL 执行超过 SocketTimeout 阈值而断链。

该案例在生产实践中无法避免,一是业务表的数据量无法控制,二是不好预估 DDL 如列表表更、增添索引的执行耗时;建议的优化是:

  1. Incy-Sync 增加 oracleSocketTimeout 参数,并将添加该参数列入运维规范(修改方式参考上文章节)。
  2. 修改参数默认值 50s -> 600s (具体值依实际环境而定)。
参考资料
[1] 

Incr-Sync 组件: https://www.oceanbase.com/docs/community-oms-cn-1000000003497874

[2] 

官网参考文章1: https://www.oceanbase.com/knowledge-base/oms-ee-1000000000253329

[3] 

官网参考文章2: https://www.oceanbase.com/docs/enterprise-oms-doc-cn-1000000000613559


本文关键字:#OceanBase #OMS





带宽被 OBServer 备份 “榨干”,集群陷入 “无主” 危机
一个案例掌握 OMS 校验方式及适用场景
OBLogProxy 在 Binlog 模式下的故障案例解析
计算 OceanBase 可用 CPU 的核心逻辑
3.X vs 4.X:OceanBase 手动收集统计信息的天壤之别!


✨ Github:https://github.com/actiontech/sqle

📚 文档:https://actiontech.github.io/sqle-docs/

💻 官网:https://opensource.actionsky.com/sqle/

👥 微信群:请添加小助手加入 ActionOpenSource

🔗 商业支持:https://www.actionsky.com/sqle


文章转载自爱可生开源社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论