暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

OB临时数据占用空间问题分析

原创 范计杰 2024-06-24
497

一、概述

OB存储使用率高时,有时是被临时数据占用,本文提供思路用于分析临时数据占用情况及原因,OB V3.2.4
image.png

二、分析思路

1、有没有正在创建索引的操作

​ __all_virtual_sys_task_status

2、有没有分布式执行计划,产生的数据交换导致的TMP占用(query timeout后释放)

​ V$OB_DTL_INTERM_RESULT_MONITOR

3、有没有复杂JOIN,导致的TMP占用(如HASH JOIN,SORT),SQL执行结束或timeout后释放

​ __all_virtual_processlist

​ 通过活动会话查长时间执行的SQL

三、具体分析示例

3.1 分析是否tenant tmp data占用

某个OBSERVER的存储使用率高时,可以通过__all_space_usage统计是否tenant tmp data占用
临时文件一般可能是大sql(JOIN,SORT,分布式数据交换等),或者建索引等产生,确认目前无建索引动作。

MySQL [oceanbase]> select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data'  group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip         | svr_port | file_type       | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.104 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.111 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.150 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.185 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.194 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.71  |     2882 | tenant tmp data |     2317352960 |     2317352960 | <<<<
| 192.168.56.72  |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.73  |     2882 | tenant tmp data |              0 |              0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)

3.2 建索引导致TMP占用

3.2.1 创建索引
obclient [SYS]> create index idx_testtmp on testtmp©;

3.2.2 检查正在进行的sys task,可以看到存在正在创建索引的操作

MySQL [oceanbase]>  select * from __all_virtual_sys_task_status;
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| start_time                 | task_type    | task_id                           | svr_ip        | svr_port | tenant_id | comment                                                                                                                                             | is_cancel |
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| 2023-12-06 16:44:17.809653 | create index | YB420AE6B747-00060A225BCC02BC-0-0 | 192.168.56.71 |     2882 |      1003 | build index task: pkey={tid:1102810162796219, partition_id:0, part_cnt:0} index_id=1102810162796220 snapshot_version=1701852257611851 parallelism=2 |         0 |
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
1 row in set (0.018 sec)

3.2.3 观察到创建索引过程中的临时数据占用

MySQL [oceanbase]> select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data'  group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip         | svr_port | file_type       | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.104 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.111 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.150 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.185 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.194 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.71  |     2882 | tenant tmp data |     2317352960 |     2317352960 | <<<<<观察到建索引过程中产生的TMP占用
| 192.168.56.72  |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.73  |     2882 | tenant tmp data |              0 |              0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)

3.2.4 从observer log观察到external sort do merge操作

cat   observer.log.20231206164907052|grep YB420AE6B747-00060A225BCC02BC-0-0 |grep "external"

[2023-12-06 16:48:17.463926] INFO  [STORAGE] ob_parallel_external_sort.h:1230 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] external sort do merge end
[2023-12-06 16:48:17.463940] INFO  [STORAGE] ob_parallel_external_sort.h:1837 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=7] [dc=0] do sort end round(round_id=1, round_cost_time=49700571)
[2023-12-06 16:48:17.463944] INFO  [STORAGE] ob_parallel_external_sort.h:1826 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=3] [dc=0] do sort start round(round_id=2)
[2023-12-06 16:48:17.463948] INFO  [STORAGE] ob_parallel_external_sort.h:1216 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=2] [dc=0] external sort do merge start
[2023-12-06 16:48:34.906739] INFO  [STORAGE] ob_parallel_external_sort.h:1130 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] build fragment(writer_.get_fd()=7105, writer_.get_sample_item()=flag=16 first_dml=0 dml=0 capacity_=0 is_get=0 from_base=0 trans_id=null scan_index=0 multi_version_row_flag=0 {row_compact:0, row_last:0, row_first:0, row_uncommitted:0, row_magic:0, reserved:0, flag:0}row_val={count=2,cells=[{"VARCHAR2":"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb1", collation:"utf8mb4_bin", coercibility:"IMPLICIT"},{"BIGINT UNSIGNED":1},] is_sparse_row=0 snapshot_version=0 fast_filtered=0 range_array_idx=0 last_purge_ts=0 )
[2023-12-06 16:48:34.978518] INFO  [STORAGE] ob_parallel_external_sort.h:1230 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] external sort do merge end
[2023-12-06 16:48:34.978530] INFO  [STORAGE] ob_parallel_external_sort.h:1837 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do sort end round(round_id=2, round_cost_time=19797156)
[2023-12-06 16:48:37.321154] INFO  [STORAGE] ob_parallel_external_sort.h:1130 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=10] [dc=0] build fragment(writer_.get_fd()=7106, writer_.get_sample_item()=flag=16 first_dml=0 dml=0 capacity_=0 is_get=0 from_base=0 trans_id=null scan_index=0 multi_version_row_flag=0 {row_compact:0, row_last:0, row_first:0, row_uncommitted:0, row_magic:0, reserved:0, flag:0}row_val={count=2,cells=[{"VARCHAR2":"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb1", collation:"utf8mb4_bin", coercibility:"IMPLICIT"},{"BIGINT UNSIGNED":16811425},] is_sparse_row=0 snapshot_version=0 fast_filtered=0 range_array_idx=0 last_purge_ts=0 )
[2023-12-06 16:48:37.397255] INFO  [STORAGE] ob_parallel_external_sort.h:1230 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=5] [dc=0] external sort do merge end
[2023-12-06 16:48:37.397268] INFO  [STORAGE] ob_parallel_external_sort.h:1837 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do sort end round(round_id=2, round_cost_time=19933323)

3.2.5
索引创建完成后,临时数据被清理

| svr_ip         | svr_port | file_type       | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.104 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.111 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.150 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.185 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.194 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.71  |     2882 | tenant tmp data |              0 |              0 |<<<<索引创建完成后,会清理tmp data
| 192.168.56.72  |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.73  |     2882 | tenant tmp data |              0 |              0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)

[admin@ob1 log]$ cat  observer.log.20231206165155396|grep YB420AE6B747-00060A225BCC02BC-0-0 |grep "external"
[2023-12-06 16:49:35.989393] INFO  [STORAGE] ob_parallel_external_sort.h:1888 [1942177][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do external sort clean up
[2023-12-06 16:49:36.142799] INFO  [STORAGE] ob_parallel_external_sort.h:1888 [1942185][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=7] [dc=0] do external sort clean up
[admin@ob1 log]$ 

3.3 HASH JOIN交换到磁盘TMP占用

3.3.1 模拟长时间执行的HASH JOIN操作

select count(*) from testtmp a,testtmp b ,testtmp c  where a.c=b.c and b.c=b.c;
| =========================================================================
|ID|OPERATOR                   |NAME|EST. ROWS        |COST             |
-------------------------------------------------------------------------
|0 |SCALAR GROUP BY            |    |1                |1.363096e+16     |
|1 | NESTED-LOOP JOIN CARTESIAN|    |3.374094e+16     |1.234320e+16     |
|2 |  HASH JOIN                |    |1028688469       |299998301        |
|3 |   TABLE SCAN              |A   |32799960         |12687179         |
|4 |   TABLE SCAN              |B   |32799960         |12885558         |
|5 |  MATERIAL                 |    |32799960         |12857799         |
|6 |   TABLE SCAN              |C   |32799960         |12687179         |
=========================================================================

3.3.2 观察到HASH JOIN过程中临时数据占用

MySQL [oceanbase]>  select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data'  group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip         | svr_port | file_type       | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.104 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.111 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.150 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.185 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.194 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.71  |     2882 | tenant tmp data |     3428843520 |     3428843520 |<<<<观察到大表HASH JOIN 占用TMP
| 192.168.56.72  |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.73  |     2882 | tenant tmp data |              0 |              0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.002 sec)

3.3.3 通过活动会话,定位正在进行的慢SQL


select * from oceanbase.__all_virtual_processlist where state<>'SLEEP' order by time\G

*************************** 5. row ***************************
            id: 3223649550
          user: SYS
        tenant: testf
          host: 192.168.56.103:27194
            db: SYS
       command: Query
        sql_id: 8642939B77FED2F233D070BE18A49A51
          time: 662.584797
         state: ACTIVE
          info: select count(*) from testtmp a,testtmp b ,testtmp c  where a.c=b.c and b.c=b.c
        svr_ip: 192.168.56.71
      svr_port: 2882
      sql_port: 2881
  proxy_sessid: 785516838221234658
 master_sessid: NULL
user_client_ip: 192.168.56.103
     user_host: %
      trans_id: 0
     thread_id: 1944269
    ssl_cipher: NULL
      trace_id: YB420AE6B747-00060A1E6E2E8848-0-0
   trans_state: 
    total_time: 662.584842
     retry_cnt: 0
    retry_info: 0
        action: 
        module: 
   client_info: 
total_cpu_time: 662.584797
5 rows in set (0.020 sec)

这时其实可以通过__all_virtual_sql_workarea_active 视图定位HASH JOIN,SORT workarea内存不够导致的临时文件占用

示例输出,并非问题点实际输出
MySQL [(none)]> select *from oceanbase.__all_virtual_sql_workarea_active\G
*************************** 1. row ***************************
         svr_ip: 192.168.56.18
       svr_port: 2882
        plan_id: 54573828
         sql_id: DD01478061EE64851ED9E8A9DAE47BF9
    sql_exec_id: -1
 operation_type: PHY_HASH_JOIN
   operation_id: 51
            sid: 785528627906451356
    active_time: 83497
 work_area_size: 536859283
    expect_size: 488053894
actual_mem_used: 40032
   max_mem_used: 40032
  number_passes: 0
   tempseg_size: 0   <<<<
      tenant_id: 1001
         policy: AUTO

3.3.4 查询取消后,临时数据释放


obclient [SYS]> select count(*) from testtmp a,testtmp b ,testtmp c  where a.c=b.c and b.c=b.c;
^CCtrl-C -- query killed. Continuing normally.
ORA-00600: internal error code, arguments: -5065, Query execution was interrupted
obclient [SYS]> 
+----------------+----------+-----------------+----------------+----------------+
| svr_ip         | svr_port | file_type       | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.104 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.111 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.150 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.185 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.194 |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.71  |     2882 | tenant tmp data |              0 |              0 |<<<取消查询后tmp释放
| 192.168.56.72  |     2882 | tenant tmp data |              0 |              0 |
| 192.168.56.73  |     2882 | tenant tmp data |              0 |              0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)

3.4 分布式执行计划,数据交换量过大导致的临时数据占用

__all_virtual_dtl_interm_result_monitor 展示 DTL 中间结果管理器的监控信息。
这个在分布式计划下 有exchange的时候 才有数据。HASH JOIN,SORT等占用TMP没有记录
MySQL [oceanbase]> select * from __all_virtual_dtl_interm_result_monitor order by dump_size desc limit 10\G
*************************** 1. row ***************************
         svr_ip: 192.168.56.8
       svr_port: 2882
      tenant_id: 1001
       trace_id: YB420AE6C215-00060AAF1FBFE740-0-0
          owner: DtlIntermRes
     start_time: 2023-12-06 15:02:28.954429
    expire_time: 2023-12-06 17:02:28.952085<<<
    hold_memory: 0
      dump_size: 65512
      dump_cost: 780
      dump_time: 2023-12-06 15:02:39.913146
        dump_fd: 7105710
    dump_dir_id: 97
     channel_id: 5254124723299675
          qc_id: 1
         dfo_id: 0
         sqc_id: 0
       batch_id: 0
max_hold_memory: 66048
*************************** 2. row ***************************
         svr_ip: 192.168.56.8
       svr_port: 2882
      tenant_id: 1001
       trace_id: YB420AE6C215-00060AAF770EB6ED-0-0
          owner: DtlIntermRes
     start_time: 2023-12-06 15:05:22.779180
    expire_time: 2023-12-06 17:05:22.769355
    hold_memory: 0
      dump_size: 0
      dump_cost: 0
      dump_time: NULL
        dump_fd: -1
    dump_dir_id: -1
     channel_id: 5254124729611771
          qc_id: 3
         dfo_id: 0
         sqc_id: 1
       batch_id: 0
max_hold_memory: 64
2 rows in set (0.041 sec)

四、解决方案

1、创建索引导致的临时数据占用,索引创建完成或取消后自动清理
2、慢SQL 中HASH JOIN,SORT等操作导致的临时空间占用,SQL执行结束后自动清理
3、分布式执行计划DTL数据交换导致的临时空间占用,SQL执行计划或超时后自动清理(部分observer版本需要等超时参数时间后才会清理临时文件(事务结束后会不会清理?)。在324bp7版本上会sql执行完立马清理)

五、参考文档

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论