暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

pg timescaledb的一些摸索

godba 2024-07-01
230

背景:

这几天看了时序数据的一些特征,在pg中也有timescaledb插件就抽空做了下测试。 

表类型表名称
数据量

源表:

ice_zone_main_partion_temp5645M
普通表

ice_zone_main_partion_t2


unlog表

ice_zone_main_partion_temp_t


超表:

ice_zone_main_partion_chunk


超表语法:

    select create_hypertable('msc_anal.ice_zone_main_partion_chunk', 'pub_time',
    chunk_time_interval => interval '12 hour');

    数据情况:

      select count(1) ,pub_time from msc_anal.ice_zone_main_partion_temp group by
      pub_time;
      157983 2024-02-19 08:00:00
      4508509 2024-02-19 20:00:00
      109031 2024-02-20 08:00:00
      2552683 2024-02-20 20:00:00
      908130 2024-02-21 20:00:00

      数据插入性能对比

      普通表:

        insert into ice_zone_main_partion_t2 select * from
        ice_zone_main_partion_temp
        > Affected rows: 8236336
        > 查询时间: 83.570s

        unlog表

          insert into ice_zone_main_partion_t select * from
          ice_zone_main_partion_temp
          > Affected rows: 8236336
          > 查询时间: 17.937s

          超表:

            insert into ice_zone_main_partion_chunk select * from
            ice_zone_main_partion_temp
            > Affected rows: 8236336
            > 查询时间: 106.413s

            按照原理上面看超表的插入性能应该比超过普通表。  怀疑是数据量不够大,chunk数量过少的原因导致,后续继续测试...  时序数据往往数据量非常大。那么数据存储成本过高。即数据压缩就变得非常重要。 

            数据存储与压缩

            查询超表体积

              select schema_name,table_name, associated_table_prefix 
              from _timescaledb_catalog.hypertable ;
              msc_anal  ice_zone_main_partion_chunk _hyper_3 --- 查询超表的前缀

              超表体积  大概 5696M

                _timescaledb_internal _hyper_3_11_chunk 3084 MB
                _timescaledb_internal _hyper_3_12_chunk 1774 MB
                _timescaledb_internal _hyper_3_13_chunk 648 MB
                _timescaledb_internal _hyper_3_14_chunk 112 MB
                _timescaledb_internal _hyper_3_15_chunk 78 MB

                超表启用数据压缩

                  -- ice_zone_main_partion  类似ice_zone_main_partion_chunk 一样建立的 
                  ALTER TABLE msc_anal.ice_zone_main_partion set ( 
                  timescaledb.compress,
                  timescaledb.compress_segmentby = 'pub_time'
                  )
                  select compress_chunk('_timescaledb_internal._hyper_1_1_chunk' ); --大约60S
                  select compress_chunk('_timescaledb_internal._hyper_1_2_chunk' );--30S
                  select compress_chunk('_timescaledb_internal._hyper_1_3_chunk' );--11S
                  select compress_chunk('_timescaledb_internal._hyper_1_4_chunk' );--2.4S
                  select compress_chunk('_timescaledb_internal._hyper_1_5_chunk' );--1.7S

                  压缩后查询数据体积

                    select hypertable_schema, hypertable_name , chunk_name ,
                    pg_size_pretty(uncompressed_total_size) uncompressed_total_size ,
                    pg_size_pretty( compressed_total_size ) compressed_total_size
                    from _timescaledb_internal.compressed_chunk_stats ;
                    msc_anal ice_zone_main_partion _hyper_1_1_chunk 3084 MB 473 MB
                    msc_anal ice_zone_main_partion _hyper_1_2_chunk 1774 MB 276 MB
                    msc_anal ice_zone_main_partion _hyper_1_3_chunk 648 MB 99 MB
                    msc_anal ice_zone_main_partion _hyper_1_4_chunk 112 MB 18 MB
                    msc_anal ice_zone_main_partion _hyper_1_5_chunk 78 MB    12 MB

                    压缩前后体积:5696M VS 878M

                    SQL与执行计划

                    检查测试 select count的执行效率

                    普通表/unlog表  全表扫描 没什么好说的

                    未压缩的超表:

                      EXPLAIN (analyse, buffers )
                      select count(1) from msc_anal.ice_zone_main_partion_chunk;
                      Finalize Aggregate (cost=111631.84..111631.85 rows=1 width=8) (actual
                      time=416.846..419.368 rows=1 loops=1)
                      Buffers: shared hit=6507
                      -> Gather (cost=35120.24..111631.82 rows=10 width=8) (actual
                      time=22.697..419.355 rows=8 loops=1)
                      Workers Planned: 2
                      Workers Launched: 2
                      Buffers: shared hit=6507
                      -> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actual
                      time=235.779..410.020 rows=3 loops=3)
                      Buffers: shared hit=6507
                      -> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)
                      (actual time=222.262..222.263 rows=1 loops=3)
                      Buffers: shared hit=3562
                      -> Parallel Index Only Scan using
                      _hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk
                      (cost=0.43..55521.10 rows=1877848 width=0) (actual time=0.043..137.530
                      rows=1502836 loops=3)
                      Heap Fetches: 0
                      Buffers: shared hit=3562
                      -> Partial Aggregate (cost=34120.24..34120.25 rows=1 width=8)
                      (actual time=190.985..190.985 rows=1 loops=2)
                      Buffers: shared hit=2015
                      -> Parallel Index Only Scan using
                      _hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk
                      (cost=0.43..31460.26 rows=1063992 width=0) (actual time=0.038..118.142
                      rows=1276342 loops=2)
                      Heap Fetches: 0
                      Buffers: shared hit=2015
                      -> Partial Aggregate (cost=12135.80..12135.81 rows=1 width=8)
                      (actual time=136.483..136.484 rows=1 loops=1)
                      Buffers: shared hit=715
                      -> Parallel Index Only Scan using
                      _hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk
                      (cost=0.42..11189.74 rows=378423 width=0) (actual time=0.014..80.750
                      rows=908130 loops=1)
                      Heap Fetches: 0
                      Buffers: shared hit=715
                      -> Partial Aggregate (cost=2459.85..2459.86 rows=1 width=8)
                      (actual time=22.860..22.861 rows=1 loops=1)
                      Buffers: shared hit=127
                      -> Parallel Index Only Scan using
                      _hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk
                      (cost=0.29..2227.52 rows=92931 width=0) (actual time=0.012..13.868 rows=157983
                      loops=1)
                      Heap Fetches: 0
                      Buffers: shared hit=127
                      -> Partial Aggregate (cost=1699.15..1699.16 rows=1 width=8)
                      (actual time=21.933..21.934 rows=1 loops=1)
                      Buffers: shared hit=88
                      -> Parallel Index Only Scan using
                      _hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk
                      (cost=0.29..1538.81 rows=64136 width=0) (actual time=0.035..13.242 rows=109031
                      loops=1)
                      Heap Fetches: 0
                      Buffers: shared hit=88
                      Planning:
                      Buffers: shared hit=43
                      Planning Time: 0.696 ms
                      Execution Time: 419.499 ms

                      简述:

                        -> Parallel Append (cost=34120.24..110630.82 rows=5 width=8) (actualtime=235.779..410.020 rows=3 loops=3)
                        Buffers: shared hit=6507
                        -> Partial Aggregate (cost=60215.72..60215.73 rows=1 width=8)
                        -> Parallel Index Only Scan using _hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_11_chunk
                        -> Parallel Index Only Scan using _hyper_3_12_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_12_chunk
                        -> Parallel Index Only Scan using _hyper_3_13_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_13_chunk
                        -> Parallel Index Only Scan using _hyper_3_14_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_14_chunk
                           -> Parallel Index Only Scan using _hyper_3_15_chunk_ice_zone_main_partion_chunk_pub_time_idx on _hyper_3_15_chunk

                        _hyper_3_11_chunk_ice_zone_main_partion_chunk_pub_time_idx是自动建立好的索引。

                        从执行计划上面超表对chunk物理文件采用并行扫描。

                        压缩的超表的执行计划:

                          EXPLAIN (analyse, buffers)
                          select count(1) from msc_anal.ice_zone_main_partion;
                          Finalize Aggregate (cost=14034.06..14034.07 rows=1 width=8) (actual
                          time=310.944..311.859 rows=1 loops=1)
                          Buffers: shared hit=867
                          -> Gather (cost=1179.15..14034.04 rows=10 width=8) (actual
                          time=308.720..311.851 rows=8 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          Buffers: shared hit=867
                          -> Parallel Append (cost=179.15..13033.04 rows=5 width=8) (actual
                          time=140.290..302.473 rows=3 loops=3)
                          Buffers: shared hit=867
                          -> Partial Aggregate (cost=179.15..179.16 rows=1 width=8)
                          (actual time=12.732..12.733 rows=1 loops=1)
                          Buffers: shared hit=16
                          -> Custom Scan (DecompressChunk) on _hyper_1_5_chunk
                          (cost=0.26..16.65 rows=65000 width=0) (actual time=0.025..6.375 rows=109031
                          loops=1)
                          Buffers: shared hit=16
                          -> Parallel Seq Scan on compress_hyper_2_10_chunk
                          (cost=0.00..16.65 rows=65 width=4) (actual time=0.011..0.120 rows=110 loops=1)
                          Buffers: shared hit=16
                          -> Partial Aggregate (cost=1470.85..1470.86 rows=1 width=8)
                          (actual time=101.240..101.241 rows=1 loops=1)
                          Buffers: shared hit=128
                          -> Custom Scan (DecompressChunk) on _hyper_1_3_chunk
                          (cost=0.25..133.35 rows=535000 width=0) (actual time=0.022..47.858 rows=908130
                          loops=1)
                          Buffers: shared hit=128
                          -> Parallel Seq Scan on compress_hyper_2_8_chunk
                          (cost=0.00..133.35 rows=535 width=4) (actual time=0.011..0.870 rows=909
                          loops=1)
                          Buffers: shared hit=128
                          -> Partial Aggregate (cost=249.43..249.44 rows=1 width=8)
                          (actual time=17.324..17.325 rows=1 loops=1)
                          Buffers: shared hit=16
                          -> Custom Scan (DecompressChunk) on _hyper_1_4_chunk
                          (cost=0.18..16.93 rows=93000 width=0) (actual time=0.009..8.669 rows=157983
                          loops=1)
                          Buffers: shared hit=16
                          -> Parallel Seq Scan on compress_hyper_2_9_chunk
                          (cost=0.00..16.93 rows=93 width=4) (actual time=0.007..0.137 rows=158 loops=1)
                          Buffers: shared hit=16
                          -> Partial Aggregate (cost=4026.02..4026.03 rows=1 width=8)
                          (actual time=91.779..91.779 rows=1 loops=3)
                          Buffers: shared hit=256
                          -> Custom Scan (DecompressChunk) on _hyper_1_2_chunk
                          (cost=0.18..271.02 rows=1502000 width=0) (actual time=0.009..44.385 rows=850894
                          loops=3)
                          Buffers: shared hit=256
                          -> Parallel Seq Scan on compress_hyper_2_7_chunk
                          (cost=0.00..271.02 rows=1502 width=4) (actual time=0.005..0.542 rows=851
                          loops=3)
                          Buffers: shared hit=256
                          -> Partial Aggregate (cost=7107.52..7107.53 rows=1 width=8)
                          (actual time=250.381..250.381 rows=1 loops=2)
                          Buffers: shared hit=451
                          -> Custom Scan (DecompressChunk) on _hyper_1_1_chunk
                          (cost=0.18..477.52 rows=2652000 width=0) (actual time=0.012..118.964
                          rows=2254254 loops=2)
                          Buffers: shared hit=451
                          -> Parallel Seq Scan on compress_hyper_2_6_chunk
                          (cost=0.00..477.52 rows=2652 width=4) (actual time=0.006..0.871 rows=2254
                          loops=2)
                          Buffers: shared hit=451
                          Planning:
                          Buffers: shared hit=90
                          Planning Time: 1.682 ms
                          Execution Time: 312.036 ms

                          压缩后扫描的数据量大力度减少。

                          总结:

                          数据体积对比

                          表类型
                          数量体量
                          源数据量8236336条
                          普通表
                          5645M
                          超表5696M
                          压缩后的超表878M

                          insert效率对比

                          表类型
                          数量体量
                          源数据量8236336条
                          普通表
                          83.570s
                          unlog表17.9S
                          超表106.4S

                          按照原理上面看超表的插入性能应该比超过普通表。  怀疑是数据量不够大,chunk数量过少的原因导致,后续继续测试...  时序数据往往数据量非常大。那么数据存储成本过高。即数据压缩就变得非常重要。 

                          select count对比

                          表类型
                          执行计划与执行效率
                          普通表全表扫描. 
                          超表
                          走自建的索引 shared hit=6507  419ms
                          压缩的超表走自建的索引 shared hit=867  312ms

                          超表会把数据按照策略拆分到各个chunk中并且自动建立索引,执行计划可以利用这些特性,可以并行走索引扫描。 数据压缩后虽然扫描的数据量大幅度减少,但是执行计划中显示数据解压缩耗费大量时间。






























                          文章转载自godba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                          评论