一、概述
OB存储使用率高时,有时是被临时数据占用,本文提供思路用于分析临时数据占用情况及原因,OB V3.2.4

二、分析思路
1、有没有正在创建索引的操作
__all_virtual_sys_task_status
2、有没有分布式执行计划,产生的数据交换导致的TMP占用(query timeout后释放)
V$OB_DTL_INTERM_RESULT_MONITOR
3、有没有复杂JOIN,导致的TMP占用(如HASH JOIN,SORT),SQL执行结束或timeout后释放
__all_virtual_processlist
通过活动会话查长时间执行的SQL
三、具体分析示例
3.1 分析是否tenant tmp data占用
某个OBSERVER的存储使用率高时,可以通过__all_space_usage统计是否tenant tmp data占用
临时文件一般可能是大sql(JOIN,SORT,分布式数据交换等),或者建索引等产生,确认目前无建索引动作。
MySQL [oceanbase]> select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data' group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip | svr_port | file_type | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.104 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.111 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.150 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.185 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.194 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.71 | 2882 | tenant tmp data | 2317352960 | 2317352960 | <<<<
| 192.168.56.72 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.73 | 2882 | tenant tmp data | 0 | 0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)
3.2 建索引导致TMP占用
3.2.1 创建索引
obclient [SYS]> create index idx_testtmp on testtmp©;
3.2.2 检查正在进行的sys task,可以看到存在正在创建索引的操作
MySQL [oceanbase]> select * from __all_virtual_sys_task_status;
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| start_time | task_type | task_id | svr_ip | svr_port | tenant_id | comment | is_cancel |
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| 2023-12-06 16:44:17.809653 | create index | YB420AE6B747-00060A225BCC02BC-0-0 | 192.168.56.71 | 2882 | 1003 | build index task: pkey={tid:1102810162796219, partition_id:0, part_cnt:0} index_id=1102810162796220 snapshot_version=1701852257611851 parallelism=2 | 0 |
+----------------------------+--------------+-----------------------------------+---------------+----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
1 row in set (0.018 sec)
3.2.3 观察到创建索引过程中的临时数据占用
MySQL [oceanbase]> select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data' group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip | svr_port | file_type | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.104 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.111 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.150 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.185 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.194 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.71 | 2882 | tenant tmp data | 2317352960 | 2317352960 | <<<<<观察到建索引过程中产生的TMP占用
| 192.168.56.72 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.73 | 2882 | tenant tmp data | 0 | 0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)
3.2.4 从observer log观察到external sort do merge操作
cat observer.log.20231206164907052|grep YB420AE6B747-00060A225BCC02BC-0-0 |grep "external"
[2023-12-06 16:48:17.463926] INFO [STORAGE] ob_parallel_external_sort.h:1230 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] external sort do merge end
[2023-12-06 16:48:17.463940] INFO [STORAGE] ob_parallel_external_sort.h:1837 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=7] [dc=0] do sort end round(round_id=1, round_cost_time=49700571)
[2023-12-06 16:48:17.463944] INFO [STORAGE] ob_parallel_external_sort.h:1826 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=3] [dc=0] do sort start round(round_id=2)
[2023-12-06 16:48:17.463948] INFO [STORAGE] ob_parallel_external_sort.h:1216 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=2] [dc=0] external sort do merge start
[2023-12-06 16:48:34.906739] INFO [STORAGE] ob_parallel_external_sort.h:1130 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] build fragment(writer_.get_fd()=7105, writer_.get_sample_item()=flag=16 first_dml=0 dml=0 capacity_=0 is_get=0 from_base=0 trans_id=null scan_index=0 multi_version_row_flag=0 {row_compact:0, row_last:0, row_first:0, row_uncommitted:0, row_magic:0, reserved:0, flag:0}row_val={count=2,cells=[{"VARCHAR2":"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb1", collation:"utf8mb4_bin", coercibility:"IMPLICIT"},{"BIGINT UNSIGNED":1},] is_sparse_row=0 snapshot_version=0 fast_filtered=0 range_array_idx=0 last_purge_ts=0 )
[2023-12-06 16:48:34.978518] INFO [STORAGE] ob_parallel_external_sort.h:1230 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=4] [dc=0] external sort do merge end
[2023-12-06 16:48:34.978530] INFO [STORAGE] ob_parallel_external_sort.h:1837 [1942234][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do sort end round(round_id=2, round_cost_time=19797156)
[2023-12-06 16:48:37.321154] INFO [STORAGE] ob_parallel_external_sort.h:1130 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=10] [dc=0] build fragment(writer_.get_fd()=7106, writer_.get_sample_item()=flag=16 first_dml=0 dml=0 capacity_=0 is_get=0 from_base=0 trans_id=null scan_index=0 multi_version_row_flag=0 {row_compact:0, row_last:0, row_first:0, row_uncommitted:0, row_magic:0, reserved:0, flag:0}row_val={count=2,cells=[{"VARCHAR2":"abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb1", collation:"utf8mb4_bin", coercibility:"IMPLICIT"},{"BIGINT UNSIGNED":16811425},] is_sparse_row=0 snapshot_version=0 fast_filtered=0 range_array_idx=0 last_purge_ts=0 )
[2023-12-06 16:48:37.397255] INFO [STORAGE] ob_parallel_external_sort.h:1230 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=5] [dc=0] external sort do merge end
[2023-12-06 16:48:37.397268] INFO [STORAGE] ob_parallel_external_sort.h:1837 [1942158][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do sort end round(round_id=2, round_cost_time=19933323)
3.2.5
索引创建完成后,临时数据被清理
| svr_ip | svr_port | file_type | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.104 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.111 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.150 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.185 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.194 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.71 | 2882 | tenant tmp data | 0 | 0 |<<<<索引创建完成后,会清理tmp data
| 192.168.56.72 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.73 | 2882 | tenant tmp data | 0 | 0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)
[admin@ob1 log]$ cat observer.log.20231206165155396|grep YB420AE6B747-00060A225BCC02BC-0-0 |grep "external"
[2023-12-06 16:49:35.989393] INFO [STORAGE] ob_parallel_external_sort.h:1888 [1942177][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=6] [dc=0] do external sort clean up
[2023-12-06 16:49:36.142799] INFO [STORAGE] ob_parallel_external_sort.h:1888 [1942185][0][YB420AE6B747-00060A225BCC02BC-0-0] [lt=7] [dc=0] do external sort clean up
[admin@ob1 log]$
3.3 HASH JOIN交换到磁盘TMP占用
3.3.1 模拟长时间执行的HASH JOIN操作
select count(*) from testtmp a,testtmp b ,testtmp c where a.c=b.c and b.c=b.c;
| =========================================================================
|ID|OPERATOR |NAME|EST. ROWS |COST |
-------------------------------------------------------------------------
|0 |SCALAR GROUP BY | |1 |1.363096e+16 |
|1 | NESTED-LOOP JOIN CARTESIAN| |3.374094e+16 |1.234320e+16 |
|2 | HASH JOIN | |1028688469 |299998301 |
|3 | TABLE SCAN |A |32799960 |12687179 |
|4 | TABLE SCAN |B |32799960 |12885558 |
|5 | MATERIAL | |32799960 |12857799 |
|6 | TABLE SCAN |C |32799960 |12687179 |
=========================================================================
3.3.2 观察到HASH JOIN过程中临时数据占用
MySQL [oceanbase]> select svr_ip, svr_port, file_type, sum(data_size),sum(used_size) from __all_space_usage where file_type='tenant tmp data' group by svr_ip, svr_port, file_type;
+----------------+----------+-----------------+----------------+----------------+
| svr_ip | svr_port | file_type | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.104 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.111 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.150 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.185 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.194 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.71 | 2882 | tenant tmp data | 3428843520 | 3428843520 |<<<<观察到大表HASH JOIN 占用TMP
| 192.168.56.72 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.73 | 2882 | tenant tmp data | 0 | 0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.002 sec)
3.3.3 通过活动会话,定位正在进行的慢SQL
select * from oceanbase.__all_virtual_processlist where state<>'SLEEP' order by time\G
*************************** 5. row ***************************
id: 3223649550
user: SYS
tenant: testf
host: 192.168.56.103:27194
db: SYS
command: Query
sql_id: 8642939B77FED2F233D070BE18A49A51
time: 662.584797
state: ACTIVE
info: select count(*) from testtmp a,testtmp b ,testtmp c where a.c=b.c and b.c=b.c
svr_ip: 192.168.56.71
svr_port: 2882
sql_port: 2881
proxy_sessid: 785516838221234658
master_sessid: NULL
user_client_ip: 192.168.56.103
user_host: %
trans_id: 0
thread_id: 1944269
ssl_cipher: NULL
trace_id: YB420AE6B747-00060A1E6E2E8848-0-0
trans_state:
total_time: 662.584842
retry_cnt: 0
retry_info: 0
action:
module:
client_info:
total_cpu_time: 662.584797
5 rows in set (0.020 sec)
这时其实可以通过__all_virtual_sql_workarea_active 视图定位HASH JOIN,SORT workarea内存不够导致的临时文件占用
示例输出,并非问题点实际输出
MySQL [(none)]> select *from oceanbase.__all_virtual_sql_workarea_active\G
*************************** 1. row ***************************
svr_ip: 192.168.56.18
svr_port: 2882
plan_id: 54573828
sql_id: DD01478061EE64851ED9E8A9DAE47BF9
sql_exec_id: -1
operation_type: PHY_HASH_JOIN
operation_id: 51
sid: 785528627906451356
active_time: 83497
work_area_size: 536859283
expect_size: 488053894
actual_mem_used: 40032
max_mem_used: 40032
number_passes: 0
tempseg_size: 0 <<<<
tenant_id: 1001
policy: AUTO
3.3.4 查询取消后,临时数据释放
obclient [SYS]> select count(*) from testtmp a,testtmp b ,testtmp c where a.c=b.c and b.c=b.c;
^CCtrl-C -- query killed. Continuing normally.
ORA-00600: internal error code, arguments: -5065, Query execution was interrupted
obclient [SYS]>
+----------------+----------+-----------------+----------------+----------------+
| svr_ip | svr_port | file_type | sum(data_size) | sum(used_size) |
+----------------+----------+-----------------+----------------+----------------+
| 192.168.56.103 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.104 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.111 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.150 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.185 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.194 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.71 | 2882 | tenant tmp data | 0 | 0 |<<<取消查询后tmp释放
| 192.168.56.72 | 2882 | tenant tmp data | 0 | 0 |
| 192.168.56.73 | 2882 | tenant tmp data | 0 | 0 |
+----------------+----------+-----------------+----------------+----------------+
9 rows in set (0.001 sec)
3.4 分布式执行计划,数据交换量过大导致的临时数据占用
__all_virtual_dtl_interm_result_monitor 展示 DTL 中间结果管理器的监控信息。
这个在分布式计划下 有exchange的时候 才有数据。HASH JOIN,SORT等占用TMP没有记录
MySQL [oceanbase]> select * from __all_virtual_dtl_interm_result_monitor order by dump_size desc limit 10\G
*************************** 1. row ***************************
svr_ip: 192.168.56.8
svr_port: 2882
tenant_id: 1001
trace_id: YB420AE6C215-00060AAF1FBFE740-0-0
owner: DtlIntermRes
start_time: 2023-12-06 15:02:28.954429
expire_time: 2023-12-06 17:02:28.952085<<<
hold_memory: 0
dump_size: 65512
dump_cost: 780
dump_time: 2023-12-06 15:02:39.913146
dump_fd: 7105710
dump_dir_id: 97
channel_id: 5254124723299675
qc_id: 1
dfo_id: 0
sqc_id: 0
batch_id: 0
max_hold_memory: 66048
*************************** 2. row ***************************
svr_ip: 192.168.56.8
svr_port: 2882
tenant_id: 1001
trace_id: YB420AE6C215-00060AAF770EB6ED-0-0
owner: DtlIntermRes
start_time: 2023-12-06 15:05:22.779180
expire_time: 2023-12-06 17:05:22.769355
hold_memory: 0
dump_size: 0
dump_cost: 0
dump_time: NULL
dump_fd: -1
dump_dir_id: -1
channel_id: 5254124729611771
qc_id: 3
dfo_id: 0
sqc_id: 1
batch_id: 0
max_hold_memory: 64
2 rows in set (0.041 sec)
四、解决方案
1、创建索引导致的临时数据占用,索引创建完成或取消后自动清理
2、慢SQL 中HASH JOIN,SORT等操作导致的临时空间占用,SQL执行结束后自动清理
3、分布式执行计划DTL数据交换导致的临时空间占用,SQL执行计划或超时后自动清理(部分observer版本需要等超时参数时间后才会清理临时文件(事务结束后会不会清理?)。在324bp7版本上会sql执行完立马清理)
五、参考文档
无




