pg timescaledb的一些摸索

godba 2024-07-01

230

背景：

这几天看了时序数据的一些特征，在pg中也有timescaledb插件就抽空做了下测试。

表类型	表名称	数据量
源表：	ice_zone_main_partion_temp	5645M
普通表	ice_zone_main_partion_t2
unlog表：	ice_zone_main_partion_temp_t
超表:	ice_zone_main_partion_chunk

超表语法:

select create_hypertable('msc_anal.ice_zone_main_partion_chunk', 'pub_time',
chunk_time_interval => interval '12 hour');

数据情况:

select count(1) ,pub_time from msc_anal.ice_zone_main_partion_temp group by
pub_time;
157983    2024-02-19 08:00:00
4508509   2024-02-19 20:00:00
109031    2024-02-20 08:00:00
2552683   2024-02-20 20:00:00
908130    2024-02-21 20:00:00

数据插入性能对比

普通表：

insert into ice_zone_main_partion_t2 select * from
ice_zone_main_partion_temp
> Affected rows: 8236336
> 查询时间: 83.570s

unlog表

insert into ice_zone_main_partion_t select * from
ice_zone_main_partion_temp
> Affected rows: 8236336
> 查询时间: 17.937s

超表:

insert into ice_zone_main_partion_chunk select * from
ice_zone_main_partion_temp
> Affected rows: 8236336
> 查询时间: 106.413s

按照原理上面看超表的插入性能应该比超过普通表。怀疑是数据量不够大，chunk数量过少的原因导致，后续继续测试... 时序数据往往数据量非常大。那么数据存储成本过高。即数据压缩就变得非常重要。

数据存储与压缩

查询超表体积

select schema_name,table_name, associated_table_prefix 
from _timescaledb_catalog.hypertable ;
msc_anal  ice_zone_main_partion_chunk _hyper_3 --- 查询超表的前缀

超表体积大概 5696M

_timescaledb_internal _hyper_3_11_chunk 3084 MB
_timescaledb_internal _hyper_3_12_chunk 1774 MB
_timescaledb_internal _hyper_3_13_chunk 648 MB
_timescaledb_internal _hyper_3_14_chunk 112 MB
_timescaledb_internal _hyper_3_15_chunk 78 MB

超表启用数据压缩

-- ice_zone_main_partion  类似ice_zone_main_partion_chunk 一样建立的 
ALTER TABLE msc_anal.ice_zone_main_partion set ( 
timescaledb.compress,
timescaledb.compress_segmentby = 'pub_time'
)
select compress_chunk('_timescaledb_internal._hyper_1_1_chunk' ); --大约60S
select compress_chunk('_timescaledb_internal._hyper_1_2_chunk' );--30S
select compress_chunk('_timescaledb_internal._hyper_1_3_chunk' );--11S
select compress_chunk('_timescaledb_internal._hyper_1_4_chunk' );--2.4S
select compress_chunk('_timescaledb_internal._hyper_1_5_chunk' );--1.7S

压缩后查询数据体积

select hypertable_schema, hypertable_name , chunk_name ,
pg_size_pretty(uncompressed_total_size) uncompressed_total_size ,
pg_size_pretty( compressed_total_size ) compressed_total_size
from _timescaledb_internal.compressed_chunk_stats ;
msc_anal ice_zone_main_partion _hyper_1_1_chunk 3084 MB  473 MB
msc_anal ice_zone_main_partion _hyper_1_2_chunk 1774 MB  276 MB
msc_anal ice_zone_main_partion _hyper_1_3_chunk 648 MB   99 MB
msc_anal ice_zone_main_partion _hyper_1_4_chunk 112 MB   18 MB
msc_anal ice_zone_main_partion _hyper_1_5_chunk 78 MB    12 MB

压缩前后体积：5696M VS 878M

SQL与执行计划

检查测试 select count的执行效率

普通表/unlog表全表扫描没什么好说的

未压缩的超表：

EXPLAIN (analyse, buffers )
select count(1) from msc_anal.ice_zone_main_partion_chunk;
Finalize Aggregate (cost=111631.84..111631.85 rows=1 width=8) (actual
time=416.846..419.368 rows=1 loops=1)
Buffers: shared hit=6507
-> Gather (cost=35120.24..111631.82 rows=10 width=8) (actual
time=22.697..419.355 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6507
-> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actual
time=235.779..410.020 rows=3 loops=3)
Buffers: shared hit=6507
-> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)
(actual time=222.262..222.263 rows=1 loops=3)
Buffers: shared hit=3562
-> Parallel Index Only Scan using
_hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk
(cost=0.43..55521.10 rows=1877848 width=0) (actual time=0.043..137.530
rows=1502836 loops=3)
Heap Fetches: 0
Buffers: shared hit=3562
-> Partial Aggregate (cost=34120.24..34120.25 rows=1 width=8)
(actual time=190.985..190.985 rows=1 loops=2)
Buffers: shared hit=2015
-> Parallel Index Only Scan using
_hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk
(cost=0.43..31460.26 rows=1063992 width=0) (actual time=0.038..118.142
rows=1276342 loops=2)
Heap Fetches: 0
Buffers: shared hit=2015
-> Partial Aggregate (cost=12135.80..12135.81 rows=1 width=8)
(actual time=136.483..136.484 rows=1 loops=1)
Buffers: shared hit=715
-> Parallel Index Only Scan using
_hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk
(cost=0.42..11189.74 rows=378423 width=0) (actual time=0.014..80.750
rows=908130 loops=1)
Heap Fetches: 0
Buffers: shared hit=715
-> Partial Aggregate (cost=2459.85..2459.86 rows=1 width=8)
(actual time=22.860..22.861 rows=1 loops=1)
Buffers: shared hit=127
-> Parallel Index Only Scan using
_hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk
(cost=0.29..2227.52 rows=92931 width=0) (actual time=0.012..13.868 rows=157983
loops=1)
Heap Fetches: 0
Buffers: shared hit=127
-> Partial Aggregate (cost=1699.15..1699.16 rows=1 width=8)
(actual time=21.933..21.934 rows=1 loops=1)
Buffers: shared hit=88
-> Parallel Index Only Scan using
_hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk
(cost=0.29..1538.81 rows=64136 width=0) (actual time=0.035..13.242 rows=109031
loops=1)
Heap Fetches: 0
Buffers: shared hit=88
Planning:
Buffers: shared hit=43
Planning Time: 0.696 ms
Execution Time: 419.499 ms

简述：

-> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actualtime=235.779..410.020 rows=3 loops=3)
   Buffers: shared hit=6507
-> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)
   -> Parallel Index Only Scan using _hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk
   -> Parallel Index Only Scan using _hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk
   -> Parallel Index Only Scan using _hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk
   -> Parallel Index Only Scan using _hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk
   -> Parallel Index Only Scan using _hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk

_hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx是自动建立好的索引。

从执行计划上面超表对chunk物理文件采用并行扫描。

压缩的超表的执行计划：

EXPLAIN (analyse, buffers)
select count(1) from msc_anal.ice_zone_main_partion;
Finalize Aggregate (cost=14034.06..14034.07 rows=1 width=8) (actual
time=310.944..311.859 rows=1 loops=1)
Buffers: shared hit=867
-> Gather (cost=1179.15..14034.04 rows=10 width=8) (actual
time=308.720..311.851 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=867
-> Parallel Append (cost=179.15..13033.04 rows=5 width=8) (actual
time=140.290..302.473 rows=3 loops=3)
Buffers: shared hit=867
-> Partial Aggregate (cost=179.15..179.16 rows=1 width=8)
(actual time=12.732..12.733 rows=1 loops=1)
Buffers: shared hit=16
-> Custom Scan (DecompressChunk) on _hyper_1_5_chunk
(cost=0.26..16.65 rows=65000 width=0) (actual time=0.025..6.375 rows=109031
loops=1)
Buffers: shared hit=16
-> Parallel Seq Scan on compress_hyper_2_10_chunk
(cost=0.00..16.65 rows=65 width=4) (actual time=0.011..0.120 rows=110 loops=1)
Buffers: shared hit=16
-> Partial Aggregate (cost=1470.85..1470.86 rows=1 width=8)
(actual time=101.240..101.241 rows=1 loops=1)
Buffers: shared hit=128
-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk
(cost=0.25..133.35 rows=535000 width=0) (actual time=0.022..47.858 rows=908130
loops=1)
Buffers: shared hit=128
-> Parallel Seq Scan on compress_hyper_2_8_chunk
(cost=0.00..133.35 rows=535 width=4) (actual time=0.011..0.870 rows=909
loops=1)
Buffers: shared hit=128
-> Partial Aggregate (cost=249.43..249.44 rows=1 width=8)
(actual time=17.324..17.325 rows=1 loops=1)
Buffers: shared hit=16
-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk
(cost=0.18..16.93 rows=93000 width=0) (actual time=0.009..8.669 rows=157983
loops=1)
Buffers: shared hit=16
-> Parallel Seq Scan on compress_hyper_2_9_chunk
(cost=0.00..16.93 rows=93 width=4) (actual time=0.007..0.137 rows=158 loops=1)
Buffers: shared hit=16
-> Partial Aggregate (cost=4026.02..4026.03 rows=1 width=8)
(actual time=91.779..91.779 rows=1 loops=3)
Buffers: shared hit=256
-> Custom Scan (DecompressChunk) on _hyper_1_2_chunk
(cost=0.18..271.02 rows=1502000 width=0) (actual time=0.009..44.385 rows=850894
loops=3)
Buffers: shared hit=256
-> Parallel Seq Scan on compress_hyper_2_7_chunk
(cost=0.00..271.02 rows=1502 width=4) (actual time=0.005..0.542 rows=851
loops=3)
Buffers: shared hit=256
-> Partial Aggregate (cost=7107.52..7107.53 rows=1 width=8)
(actual time=250.381..250.381 rows=1 loops=2)
Buffers: shared hit=451
-> Custom Scan (DecompressChunk) on _hyper_1_1_chunk
(cost=0.18..477.52 rows=2652000 width=0) (actual time=0.012..118.964
rows=2254254 loops=2)
Buffers: shared hit=451
-> Parallel Seq Scan on compress_hyper_2_6_chunk
(cost=0.00..477.52 rows=2652 width=4) (actual time=0.006..0.871 rows=2254
loops=2)
Buffers: shared hit=451
Planning:
Buffers: shared hit=90
Planning Time: 1.682 ms
Execution Time: 312.036 ms

压缩后扫描的数据量大力度减少。

总结：

数据体积对比

表类型	数量体量
源数据量	8236336条
普通表	5645M
超表‍	5696M
压缩后的超表	878M

insert效率对比

表类型	数量体量
源数据量	8236336条
普通表	83.570s
unlog表‍	17.9S
超表	106.4S

select count对比

表类型	执行计划与执行效率
普通表	全表扫描.
超表	走自建的索引 shared hit=6507 419ms
压缩的超表‍	走自建的索引 shared hit=867 312ms

超表会把数据按照策略拆分到各个chunk中并且自动建立索引，执行计划可以利用这些特性，可以并行走索引扫描。数据压缩后虽然扫描的数据量大幅度减少，但是执行计划中显示数据解压缩耗费大量时间。

文章转载自godba，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

pg timescaledb的一些摸索

评论