暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Learn OceanBase record 15:slow trans的作用?

SQL新手 2023-02-13
216

slow trans 的作用

slow trans 记录了事务执行期间,scheduler 、coordinator 、participant 等三个角色的调用链。详细统计了重要步骤的耗时信息,对问题排查有很好的帮助。

OceanBase 数据库 3.x 及之前的版本,事务类型分为单分区事务和分布事务,事务采用两阶段提交。第一阶段:只提交 participant,实现上做了较为极致的优化;第二阶段:提交 scheduler、coordinator、participant 三个角色。其中 scheduler 每个事务1个,与 session 所在的机器一起,coordinator 和 participant 是两阶段提交的两个角色,作用此处不再赘述。

slow trans 解析

scheduler

下面是 scheduler 事务上下文在 slow trans 场景下打印出来的 trace log。

observer.log:[2021-10-08 11:45:44.313295] TRACE [TRACE]:0 [9382][2334][Y9C4064586A6B-0005CDCF253FBE70] [lt=23] [dc=39] [slow trans] (TRACE=begin_ts=1633664744167732 2021-10-08 03:45:44.167732
[init] u=0 arg1:140542173395392, ctx_type:"scheduler", trans_type:2, trans_id:{hash:7103234888693285108, inc:532846, addr:{ip:"100.88.106.107", port:40000}, t:1633664744167658}, pkey:{tid:1000, partition_id:1000, part_cnt:1000}, arg1:false, uref:1073741824
[set_stc] u=1 stc:1633664744167733
[start_trans] u=2 ret:0, uref:1073741825
[start_stmt] u=7 ret:0, tenant_id:1003, sql_no:4294967297, phy_plan_type:3, stmt_type:2, sql_id:"AA2155E24549E97", trace_id:Y9C4064586A6B-0005CDCF253FBE70, start_time:1633664744166256, uref:1073741825
[end_stmt] u=30832 ret:0, is_rollback:false, rollback_to:4294967297, rollback_from:4294967297, uref:1073741825
[start_stmt] u=50000 ret:0, tenant_id:1003, sql_no:4294967298, phy_plan_type:3, stmt_type:2, sql_id:"AA2155E24549E97", trace_id:Y9C4064586A6B-0005CDCF253FBE70, start_time:1633664744166256, uref:1073741825
[end_stmt] u=10 ret:0, is_rollback:false, rollback_to:4294967298, rollback_from:4294967298, uref:1073741825                                                                                                                                    
[submit_commit] u=52 ret:0, uref:1073741824
[end_trans] u=35 ret:0, is_rollback:false, uref:1073741825
[exiting] u=114618 uref:1073741824, arg1:3221531139, arg2:7230647234361492284
[handle_message] u=1 ret:0, msg_type:1, uref:1
[end_trans_cb] u=13 ret:0, param:0, param:true
[destroy] u=1 uref:0
total_timeu=195572)

字段说明:

trace event含义重要备注
init事务刚刚开始,创建一个 scheduler 的上下文-
start_trans开始事务,一般调用 begin/start transaction 触发-
start_stmt语句开始执行-
end_stmt语句结束执行-
submit_commitscheduler 向 coordinator 发出事务提交/回滚命令这两个动作由 end_trans 触发
end_trans客户端调用事务提交/回滚的命令这两个动作由 end_trans 触发
exiting事务提交结束,设置退出状态这四个动作是同一时间点触发
handle_message收到 coordinator 发来的提交成功与否的消息这四个动作是同一时间点触发
end_trans_cb应答客户端事务提交结果这四个动作是同一时间点触发
destroy事务上下文退出这四个动作是同一时间点触发

其他说明:

trace log 中的 u=xxx 字段表示两个 trace 之间的间隔时长。

在性能问题排查过程中,如果 [start_stmt] u= xxx 中的 u= xxx 的值相对于上一个 [end_stmt] u= xxx 中的 u= xxx 的值较大。则可能存在如下原因:

  1. 客户端上一条 SQL 执行结束之后,由于数据处理耗时较长,导致下一条 SQL 发到 OBServer 端的时间比较晚。

  2. 上一条 SQL 执行结束之后,由于中间链路耗时较长(跨城、网络等)也会产生这种情况。

  3. OBServer 端资源不足,SQL 请求在 OBServer 端排队。为了确认这一情况,start_stmt 这一 trace 后面带上了该请求到达 OBServer 端的时间点 start_time,根据这个信息,也能判断出是否为该情况。

participant

下面是 participant 事务上下文在 slow trans 场景下打印出来的 trace log。

[2021-10-08 11:45:38.282471] TRACE [TRACE]:0 [8867][1304][Y9C4064586A6B-0005CDCF23FFC4F0] [lt=18] [dc=1] [slow trans] (TRACE=begin_ts=1633664737962453 2021-10-08 03:45:37.962453
[init] u=0 arg1:140559776221904, ctx_type:"participant", trans_type:2, trans_id:{hash:4872971512783667647, inc:524414, addr:{ip:"100.88.106.107", port:40000}, t:1633664737942500}, pkey:{tid:1102810162709333, partition_id:36, part_cnt:0}, arg1:false, uref:1073741824
[start_trans] u=4 ret:0, left_time:99999978261, uref:1073741825
[snapshot] u=3 snapshot:1633664737962304, uref:1073741825
[update_gts] u=1 ret:0, sql_no:{main_sql_no:1, sub_sql_no:1}, uref:1073741825, snapshot:1633664737962304, tenant_id:1003
[start_task] u=1 ret:0, sql_no:{main_sql_no:1, sub_sql_no:1}, uref:1073741825, need_update_gts:true
[end_task] u=146 ret:0, is_rollback:false, uref:1073741825

--下面的几个动作是由同一个 handle_message 触发的                                                                                                                       [set_stc] u=10590 stc:1633664737942598
[alloc_logid_ts] u=4 ret:0, log_id:18446744073709551615, tenant_id:1003, uref:1073741825, used:2
[prepare] u=0 ret:0, trans_version:-1, uref:1073741825
[inc_submit_log_count] u=19 logging:1, pending:0, uref:1073741825
[submit_log] u=4 ret:0, uref:1073741825, timeguard:{click:[1, 16, 6]}
[handle_message] u=2 ret:0, msg_type:240, uref:1073741825

--事务的 redo prepare log 写成功之后触发的回调,从 dec_submit_log_count 看出,这条日志耗时 84 ms+                                                                                                                       
[dec_submit_log_count] u=84790 logging:0, pending:0, uref:1073741825
[on_sync_log_succ] u=17 ret:0, log_type:3, log_id:14517, uref:1073741825


[handle_timeout] u=21791 ret:0, used:{click:[0, 0, 6]}, uref:1073741825

--触发写 commit log                                                                                                                       [update_trans_version] u=3051 trans_version:1633664737964470, uref:1073741825
[inc_submit_log_count] u=13 logging:1, pending:0, uref:1073741825
[submit_log] u=0 ret:0, uref:1073741825, timeguard:{click:[0, 4, 3]}
[handle_message] u=1 ret:0, msg_type:280, uref:1073741825

--commit log 写成功,触发回调事务,可以看出这条日志整体耗时 80 ms+                                                                                                                      
[dec_submit_log_count] u=80964 logging:0, pending:0, uref:1073741825
[on_sync_log_succ] u=15 ret:0, log_type:4, log_id:14519, uref:1073741825

[handle_message] u=22908 ret:0, msg_type:280, uref:1073741825
[inc_submit_log_count] u=543 logging:1, pending:0, uref:1073741825
[submit_log] u=1 ret:0, uref:1073741825, timeguard:{click:[1, 0, 3]}
[handle_message] u=0 ret:0, msg_type:320, uref:1073741825
[dec_submit_log_count] u=95141 logging:0, pending:0, uref:1073741825
[exiting] u=1 uref:1073741825, arg1:3221531078, arg2:7230647234361492206
[on_sync_log_succ] u=5 ret:0, log_type:16, log_id:14521, uref:1

[destroy] u=2 arg1:140559776223360
total_timeu=320017)

当trace log 中的 u=xxx 字段显示的间隔时长过长,请联系技术支持工程师协助排查。

coordinator

下面是 coordinator 事务上下文在 slow trans 场景下打印出来的 trace log。

observer.log:[2021-10-08 11:45:47.429179] TRACE [TRACE]:0 [9428][2426][Y9C4064586A6B-0005CDCF204FCDEE] [lt=95] [dc=7] [slow trans] (TRACE=begin_ts=1633664744223393 2021-10-08 03:45:44.223393
[init] u=0 arg1:140543227771456, ctx_type:"coordinator", trans_type:2, trans_id:{hash:16764817863274798805, inc:532875, addr:{ip:"100.88.106.107", port:40000}, t:1633664744191993}, pkey:{tid:1102810162709332, partition_id:25, part_cnt:0}, arg1:false, uref:1073741824
[cons_context] u=5 ret:0, read_only:false, uref:1073741825
[set_stc] u=1 stc:1633664744192080
[handle_message] u=51 ret:0, id1:-1, id2:-1, msg_type:0, sender:{tid:1000, partition_id:1000, part_cnt:1000}, uref:1073741825
[update_trans_version] u=4436 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=50 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=56 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=1948 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=21 trans_version:1633664744216959, uref:1073741825
[handle_message] u=2 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=7 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=6 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=319 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=9 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=7 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=12 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=32 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=542 trans_version:1633664744216959, uref:1073741825
[handle_message] u=0 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=9 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=8 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=274 trans_version:1633664744216959, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=73993 trans_version:1633664744221650, uref:1073741825
[handle_message] u=2 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=17 trans_version:1633664744221649, uref:1073741825
[handle_message] u=2 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=82 trans_version:1633664744221649, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=130 trans_version:1633664744221650, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=2675 trans_version:1633664744221650, uref:1073741825
[handle_message] u=1 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=22 trans_version:1633664744221649, uref:1073741825
[handle_message] u=2 ret:0, msg_type:250, uref:1073741825
[update_trans_version] u=5517 trans_version:1633664744221649, uref:1073741825
[handle_message] u=2 ret:0, msg_type:250, uref:1073741825
[handle_message] u=15129 ret:0, msg_type:250, uref:1073741825
[handle_timeout] u=412 ret:0, total_count:9, uref:1073741825
[handle_message] u=83 ret:0, msg_type:250, uref:1073741825
[handle_message] u=16 ret:0, msg_type:250, uref:1073741825
[handle_message] u=13 ret:0, msg_type:250, uref:1073741825
[handle_message] u=12 ret:0, msg_type:250, uref:1073741825
[handle_message] u=12 ret:0, msg_type:250, uref:1073741825
[handle_message] u=19 ret:0, msg_type:250, uref:1073741825
[handle_message] u=20 ret:0, msg_type:250, uref:1073741825
[handle_message] u=4764 ret:0, msg_type:250, uref:1073741825
[handle_message] u=48 ret:0, msg_type:250, uref:1073741825
[handle_message] u=7 ret:0, msg_type:250, uref:1073741825
[handle_message] u=5 ret:0, msg_type:250, uref:1073741825
[handle_message] u=4 ret:0, msg_type:250, uref:1073741825                                                                                                                            [handle_message] u=7
[handle_message] u=5
[handle_message] u=6
[handle_message] u=4
[handle_message] u=17
[handle_message] u=8
[handle_message] u=5
[handle_message] u=8
[handle_message] u=5
[handle_message] u=5
[update_trans_version] u=2230
[handle_message] u=2
[update_trans_version] u=1279
[handle_message] u=1
[update_trans_version] u=9483
[handle_message] u=1
[update_trans_version] u=166
[handle_message] u=1
[update_trans_version] u=26
[handle_message] u=1
[update_trans_version] u=2748
[handle_message] u=1
[update_trans_version] u=136
[handle_message] u=1
[update_trans_version] u=9
[handle_message] u=1
[update_trans_version] u=6791
[response_scheduler] u=11
[handle_message] u=35
[handle_message] u=1588
[handle_message] u=64575
[handle_message] u=2339
[handle_message] u=15533
[handle_message] u=991
[handle_message] u=18567
[destroy] u=2968405
total_timeu=3205785 DROPPED_EVENTS=15)

这是一个 coordinator 事务上下文的 trace log,驱动 40 个参与者进行事务提交,解析方法与 scheduler、participant 类似,此处不再赘述。

DROPPED_EVENTS 表示丢弃的 event 数量。

当 trace log 中的 u=xxx 字段显示的间隔时长过长,请联系技术支持工程师协助排查。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论