背景:
这几天看了时序数据的一些特征,在pg中也有timescaledb插件就抽空做了下测试。
| 表类型 | 表名称 | 数据量 |
源表: | ice_zone_main_partion_temp | 5645M |
| 普通表 | ice_zone_main_partion_t2 | |
unlog表: | ice_zone_main_partion_temp_t | |
| 超表: | ice_zone_main_partion_chunk |
超表语法:
select create_hypertable('msc_anal.ice_zone_main_partion_chunk', 'pub_time',chunk_time_interval => interval '12 hour');
数据情况:
select count(1) ,pub_time from msc_anal.ice_zone_main_partion_temp group bypub_time;157983 2024-02-19 08:00:004508509 2024-02-19 20:00:00109031 2024-02-20 08:00:002552683 2024-02-20 20:00:00908130 2024-02-21 20:00:00
数据插入性能对比
普通表:
insert into ice_zone_main_partion_t2 select * fromice_zone_main_partion_temp> Affected rows: 8236336> 查询时间: 83.570s
unlog表
insert into ice_zone_main_partion_t select * fromice_zone_main_partion_temp> Affected rows: 8236336> 查询时间: 17.937s
超表:
insert into ice_zone_main_partion_chunk select * fromice_zone_main_partion_temp> Affected rows: 8236336> 查询时间: 106.413s
按照原理上面看超表的插入性能应该比超过普通表。 怀疑是数据量不够大,chunk数量过少的原因导致,后续继续测试... 时序数据往往数据量非常大。那么数据存储成本过高。即数据压缩就变得非常重要。
数据存储与压缩
查询超表体积
select schema_name,table_name, associated_table_prefixfrom _timescaledb_catalog.hypertable ;msc_anal ice_zone_main_partion_chunk _hyper_3 --- 查询超表的前缀
超表体积 大概 5696M
_timescaledb_internal _hyper_3_11_chunk 3084 MB_timescaledb_internal _hyper_3_12_chunk 1774 MB_timescaledb_internal _hyper_3_13_chunk 648 MB_timescaledb_internal _hyper_3_14_chunk 112 MB_timescaledb_internal _hyper_3_15_chunk 78 MB
超表启用数据压缩
-- ice_zone_main_partion 类似ice_zone_main_partion_chunk 一样建立的ALTER TABLE msc_anal.ice_zone_main_partion set (timescaledb.compress,timescaledb.compress_segmentby = 'pub_time')select compress_chunk('_timescaledb_internal._hyper_1_1_chunk' ); --大约60Sselect compress_chunk('_timescaledb_internal._hyper_1_2_chunk' );--30Sselect compress_chunk('_timescaledb_internal._hyper_1_3_chunk' );--11Sselect compress_chunk('_timescaledb_internal._hyper_1_4_chunk' );--2.4Sselect compress_chunk('_timescaledb_internal._hyper_1_5_chunk' );--1.7S
压缩后查询数据体积
select hypertable_schema, hypertable_name , chunk_name ,pg_size_pretty(uncompressed_total_size) uncompressed_total_size ,pg_size_pretty( compressed_total_size ) compressed_total_sizefrom _timescaledb_internal.compressed_chunk_stats ;msc_anal ice_zone_main_partion _hyper_1_1_chunk 3084 MB 473 MBmsc_anal ice_zone_main_partion _hyper_1_2_chunk 1774 MB 276 MBmsc_anal ice_zone_main_partion _hyper_1_3_chunk 648 MB 99 MBmsc_anal ice_zone_main_partion _hyper_1_4_chunk 112 MB 18 MBmsc_anal ice_zone_main_partion _hyper_1_5_chunk 78 MB 12 MB
压缩前后体积:5696M VS 878M
SQL与执行计划
检查测试 select count的执行效率
普通表/unlog表 全表扫描 没什么好说的
未压缩的超表:
EXPLAIN (analyse, buffers )select count(1) from msc_anal.ice_zone_main_partion_chunk;Finalize Aggregate (cost=111631.84..111631.85 rows=1 width=8) (actualtime=416.846..419.368 rows=1 loops=1)Buffers: shared hit=6507-> Gather (cost=35120.24..111631.82 rows=10 width=8) (actualtime=22.697..419.355 rows=8 loops=1)Workers Planned: 2Workers Launched: 2Buffers: shared hit=6507-> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actualtime=235.779..410.020 rows=3 loops=3)Buffers: shared hit=6507-> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)(actual time=222.262..222.263 rows=1 loops=3)Buffers: shared hit=3562-> Parallel Index Only Scan using_hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk(cost=0.43..55521.10 rows=1877848 width=0) (actual time=0.043..137.530rows=1502836 loops=3)Heap Fetches: 0Buffers: shared hit=3562-> Partial Aggregate (cost=34120.24..34120.25 rows=1 width=8)(actual time=190.985..190.985 rows=1 loops=2)Buffers: shared hit=2015-> Parallel Index Only Scan using_hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk(cost=0.43..31460.26 rows=1063992 width=0) (actual time=0.038..118.142rows=1276342 loops=2)Heap Fetches: 0Buffers: shared hit=2015-> Partial Aggregate (cost=12135.80..12135.81 rows=1 width=8)(actual time=136.483..136.484 rows=1 loops=1)Buffers: shared hit=715-> Parallel Index Only Scan using_hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk(cost=0.42..11189.74 rows=378423 width=0) (actual time=0.014..80.750rows=908130 loops=1)Heap Fetches: 0Buffers: shared hit=715-> Partial Aggregate (cost=2459.85..2459.86 rows=1 width=8)(actual time=22.860..22.861 rows=1 loops=1)Buffers: shared hit=127-> Parallel Index Only Scan using_hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk(cost=0.29..2227.52 rows=92931 width=0) (actual time=0.012..13.868 rows=157983loops=1)Heap Fetches: 0Buffers: shared hit=127-> Partial Aggregate (cost=1699.15..1699.16 rows=1 width=8)(actual time=21.933..21.934 rows=1 loops=1)Buffers: shared hit=88-> Parallel Index Only Scan using_hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk(cost=0.29..1538.81 rows=64136 width=0) (actual time=0.035..13.242 rows=109031loops=1)Heap Fetches: 0Buffers: shared hit=88Planning:Buffers: shared hit=43Planning Time: 0.696 msExecution Time: 419.499 ms
简述:
-> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actualtime=235.779..410.020 rows=3 loops=3)Buffers: shared hit=6507-> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)-> Parallel Index Only Scan using _hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk-> Parallel Index Only Scan using _hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk-> Parallel Index Only Scan using _hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk-> Parallel Index Only Scan using _hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk-> Parallel Index Only Scan using _hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk
_hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx是自动建立好的索引。
从执行计划上面超表对chunk物理文件采用并行扫描。
压缩的超表的执行计划:
EXPLAIN (analyse, buffers)select count(1) from msc_anal.ice_zone_main_partion;Finalize Aggregate (cost=14034.06..14034.07 rows=1 width=8) (actualtime=310.944..311.859 rows=1 loops=1)Buffers: shared hit=867-> Gather (cost=1179.15..14034.04 rows=10 width=8) (actualtime=308.720..311.851 rows=8 loops=1)Workers Planned: 2Workers Launched: 2Buffers: shared hit=867-> Parallel Append (cost=179.15..13033.04 rows=5 width=8) (actualtime=140.290..302.473 rows=3 loops=3)Buffers: shared hit=867-> Partial Aggregate (cost=179.15..179.16 rows=1 width=8)(actual time=12.732..12.733 rows=1 loops=1)Buffers: shared hit=16-> Custom Scan (DecompressChunk) on _hyper_1_5_chunk(cost=0.26..16.65 rows=65000 width=0) (actual time=0.025..6.375 rows=109031loops=1)Buffers: shared hit=16-> Parallel Seq Scan on compress_hyper_2_10_chunk(cost=0.00..16.65 rows=65 width=4) (actual time=0.011..0.120 rows=110 loops=1)Buffers: shared hit=16-> Partial Aggregate (cost=1470.85..1470.86 rows=1 width=8)(actual time=101.240..101.241 rows=1 loops=1)Buffers: shared hit=128-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk(cost=0.25..133.35 rows=535000 width=0) (actual time=0.022..47.858 rows=908130loops=1)Buffers: shared hit=128-> Parallel Seq Scan on compress_hyper_2_8_chunk(cost=0.00..133.35 rows=535 width=4) (actual time=0.011..0.870 rows=909loops=1)Buffers: shared hit=128-> Partial Aggregate (cost=249.43..249.44 rows=1 width=8)(actual time=17.324..17.325 rows=1 loops=1)Buffers: shared hit=16-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk(cost=0.18..16.93 rows=93000 width=0) (actual time=0.009..8.669 rows=157983loops=1)Buffers: shared hit=16-> Parallel Seq Scan on compress_hyper_2_9_chunk(cost=0.00..16.93 rows=93 width=4) (actual time=0.007..0.137 rows=158 loops=1)Buffers: shared hit=16-> Partial Aggregate (cost=4026.02..4026.03 rows=1 width=8)(actual time=91.779..91.779 rows=1 loops=3)Buffers: shared hit=256-> Custom Scan (DecompressChunk) on _hyper_1_2_chunk(cost=0.18..271.02 rows=1502000 width=0) (actual time=0.009..44.385 rows=850894loops=3)Buffers: shared hit=256-> Parallel Seq Scan on compress_hyper_2_7_chunk(cost=0.00..271.02 rows=1502 width=4) (actual time=0.005..0.542 rows=851loops=3)Buffers: shared hit=256-> Partial Aggregate (cost=7107.52..7107.53 rows=1 width=8)(actual time=250.381..250.381 rows=1 loops=2)Buffers: shared hit=451-> Custom Scan (DecompressChunk) on _hyper_1_1_chunk(cost=0.18..477.52 rows=2652000 width=0) (actual time=0.012..118.964rows=2254254 loops=2)Buffers: shared hit=451-> Parallel Seq Scan on compress_hyper_2_6_chunk(cost=0.00..477.52 rows=2652 width=4) (actual time=0.006..0.871 rows=2254loops=2)Buffers: shared hit=451Planning:Buffers: shared hit=90Planning Time: 1.682 msExecution Time: 312.036 ms
压缩后扫描的数据量大力度减少。
总结:
数据体积对比
| 表类型 | 数量体量 |
| 源数据量 | 8236336条 |
| 普通表 | 5645M |
| 超表 | 5696M |
| 压缩后的超表 | 878M |
insert效率对比
| 表类型 | 数量体量 |
| 源数据量 | 8236336条 |
| 普通表 | 83.570s |
| unlog表 | 17.9S |
| 超表 | 106.4S |
按照原理上面看超表的插入性能应该比超过普通表。 怀疑是数据量不够大,chunk数量过少的原因导致,后续继续测试... 时序数据往往数据量非常大。那么数据存储成本过高。即数据压缩就变得非常重要。
select count对比
| 表类型 | 执行计划与执行效率 |
| 普通表 | 全表扫描. |
| 超表 | 走自建的索引 shared hit=6507 419ms |
| 压缩的超表 | 走自建的索引 shared hit=867 312ms |
超表会把数据按照策略拆分到各个chunk中并且自动建立索引,执行计划可以利用这些特性,可以并行走索引扫描。 数据压缩后虽然扫描的数据量大幅度减少,但是执行计划中显示数据解压缩耗费大量时间。
文章转载自godba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




