暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

PostgreSQL大表count方法总结及优化探讨

之前的文章我们了解到,PostgreSQL的MVCC是直接在原表通过增加新tuple来实现的,决定了它在大结果集count的时候性能不会太理想,因为需要对大结果集做可见性判断将会是一项繁重的工作,比如下面这种SQL:

    select count(*) from big_tab;

    单纯依靠DB进行优化,确实不是一件容易的事情。本文整理了count(*)的几种方式,并就提升count性能做初步探讨,效果不太理想,如果后续有朋友找到更好的方法,期待分享。

    测试表及数据准备

    创建一个测试表,并导入5kw的数据量:

      akendb=# create table aken01(id int,name text,info text,primary key(id));
      CREATE TABLE
      akendb=# insert into aken01 select id,'aken_'||id,md5(random()::text) from generate_series(1,50000000) as id;
      INSERT 0 50000000
      akendb=# 
      akendb=# \d+ aken01;
      Table "public.aken01"
      Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
      --------+---------+-----------+----------+---------+----------+--------------+-------------
      id | integer | | not null | | plain | |
      name | text | | | | extended | |
      info | text | | | | extended | |
      Indexes:
          "aken01_pkey" PRIMARY KEY, btree (id)
      akendb=#

      PostgreSQL大表count的方式

      在执行count之前先analyze一下:

        akendb=# analyze aken01;ANALYZETime: 536.959 ms

        1.直接count原表统计。这是方式得到的结果是最真实的,但耗时较长:

          akendb=# select count(*) from aken01;
          count
          ---------- 
          50000000
          (1 row)
          Time: 22097.741 ms (00:22.098)

          2.通过统计信息统计

          这种方式因为可以直接从系统表里面拿到数据,结果较快,但只是一个估计值,该方式可以有下面几种方法:

          1)方法一:

            akendb=# select n_live_tup as estimate_rows from pg_stat_all_tables where relname = 'aken01'; 
            estimate_rows
            ---------------
            50002378
            (1 row)
            Time: 13.768 ms
            akendb=#

            2)方法二:

              akendb=#  select reltuples::bigint as estimate_rows from pg_class where relname = 'aken01'; 
              estimate_rows
              ---------------
              50002376
              (1 row)
              Time: 0.682 ms
              akendb=#

              3)方法三:

                akendb=# select (reltuples/relpages) * (pg_relation_size('aken01')/(current_setting('block_size')::integer)) as rows from pg_class where relname = 'aken01';
                rows
                ----------
                50002376
                (1 row)
                Time: 2.457 ms
                akendb=#

                4)方法四:通过函数或执行计划信息统计

                这里参考Michael Fuhr提供的方法,创建一个function从count语句的执行计划统计:

                  akendb=# CREATE FUNCTION count_estimate(query text) RETURNS integer AS
                  akendb-# $func$
                  akendb$# DECLARE
                  akendb$# rec record;
                  akendb$# rows integer;
                  akendb$# BEGIN
                  akendb$# FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
                  akendb$# rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
                  akendb$# EXIT WHEN rows IS NOT NULL;
                  akendb$#     END LOOP;
                  akendb$# RETURN rows;
                  akendb$# END
                  akendb$# $func$ LANGUAGE plpgsql;
                  CREATE FUNCTION
                  Time: 1.023 ms
                  akendb=#

                  执行统计:

                    akendb=#SELECT count_estimate('select 1 FROM aken01');
                    count_estimate
                    ----------------
                    50002376
                    (1 row)
                    Time: 3.226 ms
                    akendb=#

                    4.触发器统计

                    这种方式对表的dml事件创建一个触发器,然后通过计数表进行rows统计,这样能得到一个真实的统计值,但会拖慢dml的性能。

                    具体可参考A. Elein Mustain的文章:

                    http://www.varlena.com/GeneralBits/120.php

                    5.通过扩展插件统计

                    我们还可以通过pgstattuple这个扩展实现,这种能得到真实的结果,但效率比较慢:

                    1)创建扩展:

                      akendb=# create extension pgstattuple;
                      CREATE EXTENSION
                      akendb=# \dx
                      List of installed extensions
                      Name | Version | Schema | Description
                      --------------------+---------+------------+-------------------------------------------------------------------
                      pg_stat_error | 1.0 | public | track error code of all processes
                      pg_stat_log | 1.0 | public | track runtime execution statistics of all SQL statements executed
                      pg_stat_statements | 1.5 | public | track execution statistics of all SQL statements executed
                      pglogical | 2.2.1 | pglogical | PostgreSQL Logical Replication
                      pgstattuple | 1.5 | public | show tuple-level statistics
                      plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
                      (6 rows)
                      akendb=#

                      2)执行统计

                        akendb=# select relname,(pgstattuple(relname)).tuple_count as rows from pg_class where relname='aken01';
                        relname | rows
                        ---------+----------
                        aken01 | 50000000
                        (1 row)
                        Time: 10082.679 ms (00:10.083)
                        akendb=#
                        akendb=# insert into aken01(id,name,info) values(50000001,'aken01','aken');
                        INSERT 0 1
                        Time: 2.459 ms
                        akendb=# select relname,(pgstattuple(relname)).tuple_count as rows from pg_class where relname='aken01';
                        relname | rows
                        ---------+----------
                        aken01 | 50000001
                        (1 row)
                        Time: 10327.826 ms (00:10.328)
                        akendb=#

                        优化尝试

                        对于数据库而言,SQL优化策略是尽量减少CPU的运算以及page的扫描数量。我们可以猜想,在没有任何谓词的count(*)查询,如果能有一个比原表小得多的索引覆盖到我们需要的数据,那么使用index-only scans的访问路径来替代seq scan,理论上应该是可以帮助我们实现优化的。

                        先来看最初的执行计划:

                          akendb=# explain (analyze,buffers,verbose) select count(*) from aken01;
                          QUERY PLAN
                          ------------------------------------------------------------------------------------------------------------------------------------
                          Aggregate (cost=1140368.00..1140368.01 rows=1 width=8) (actual time=15423.918..15423.919 rows=1 loops=1)
                          Output: count(*)
                          Buffers: shared hit=156264 read=359104
                          I/O Timings: read=717.207
                          -> Seq Scan on public.aken01 (cost=0.00..1015368.00 rows=50000000 width=0) (actual time=0.034..7601.345 rows=50000001 loops=1)
                          Output: id, name, info
                          Buffers: shared hit=156264 read=359104
                          I/O Timings: read=717.207
                          Planning time: 0.074 ms
                          Execution time: 15423.970 ms
                          (10 rows)
                          akendb=#

                          上面这里走的全表扫描Seq Scan on table,主键索引约为表的1/4:

                            akendb=# SELECT pg_size_pretty(pg_relation_size('aken01'));
                            pg_size_pretty
                            ----------------
                            4026 MB
                            (1 row)
                            akendb=# SELECT pg_size_pretty(pg_relation_size('aken01_pkey'));
                            pg_size_pretty
                            ----------------
                            1071 MB
                            (1 row)
                            akendb=#

                            我们尝试一下index only scans的效果:

                              akendb=# set enable_seqscan=off;set enable_bitmapscan = off;set enable_tidscan = off;explain (analyze,buffers,verbose) select count(*) from aken01;
                              SET
                              SET
                              SET
                              QUERY PLAN
                              --------------------------------------------------------------------------------------------------------------------------------------------------------------
                              Aggregate (cost=1938751.56..1938751.57 rows=1 width=8) (actual time=27290.551..27290.551 rows=1 loops=1)
                              Output: count(*)
                              Buffers: shared read=651983
                              I/O Timings: read=1998.864
                              -> Index Only Scan using aken01_pkey on public.aken01 (cost=0.56..1813751.56 rows=50000000 width=0) (actual time=0.051..18871.965 rows=50000001 loops=1)
                              Output: id
                              Heap Fetches: 50000001
                              Buffers: shared read=651983
                              I/O Timings: read=1998.864
                              Planning time: 0.204 ms
                              Execution time: 27290.601 ms
                              (11 rows)

                              IO基本在shared buffer,但index only scan比Seq Scan效率更差了,因该idx无vm文件,依旧要做可见性判断。如果是Oracle的同学,通常针对大表的优化,在资源充足的情况,使用分区或并行技术通常会是一个比较有效的方法,这也是AP系统主要的优化手段,这里我们在PostgreSQL开启32个并行:

                                akendb=# set parallel_tuple_cost =0;
                                SET
                                akendb=# set parallel_tuple_cost =0;
                                SET
                                akendb=# set max_parallel_workers_per_gather =32;
                                SET
                                akendb=# alter table aken01 set (parallel_workers =32);
                                ALTER TABLE
                                akendb=# explain (analyze,buffers,verbose) select count(*) from aken01;
                                QUERY PLAN
                                ------------------------------------------------------------------------------------------------------------------------------------------------------
                                Finalize Aggregate (cost=535899.34..535899.35 rows=1 width=8) (actual time=6150.317..6150.318 rows=1 loops=1)
                                Output: count(*)
                                Buffers: shared hit=155144 read=360224
                                I/O Timings: read=1641.442
                                -> Gather (cost=535899.25..535899.26 rows=32 width=8) (actual time=6150.267..6160.640 rows=9 loops=1)
                                Output: (PARTIAL count(*))
                                Workers Planned: 32
                                Workers Launched: 8
                                Buffers: shared hit=155144 read=360224
                                I/O Timings: read=1641.442
                                -> Partial Aggregate (cost=534899.25..534899.26 rows=1 width=8) (actual time=6123.835..6123.835 rows=1 loops=9)
                                Output: PARTIAL count(*)
                                Buffers: shared hit=155144 read=360224
                                I/O Timings: read=1641.442
                                Worker 0: actual time=6078.874..6078.874 rows=1 loops=1
                                Buffers: shared hit=17161 read=40125
                                I/O Timings: read=158.571
                                Worker 1: actual time=6145.014..6145.014 rows=1 loops=1
                                Buffers: shared hit=16900 read=39533
                                I/O Timings: read=177.511
                                Worker 2: actual time=6145.094..6145.094 rows=1 loops=1
                                Buffers: shared hit=17266 read=40933
                                I/O Timings: read=93.361
                                Worker 3: actual time=6145.392..6145.392 rows=1 loops=1
                                Buffers: shared hit=17111 read=39107
                                I/O Timings: read=247.815
                                Worker 4: actual time=6079.581..6079.582 rows=1 loops=1
                                Buffers: shared hit=17405 read=39287
                                I/O Timings: read=90.894
                                Worker 5: actual time=6145.646..6145.646 rows=1 loops=1
                                Buffers: shared hit=17199 read=39618
                                I/O Timings: read=245.592
                                Worker 6: actual time=6145.679..6145.679 rows=1 loops=1
                                Buffers: shared hit=17273 read=41417
                                I/O Timings: read=219.668
                                Worker 7: actual time=6079.707..6079.707 rows=1 loops=1
                                Buffers: shared hit=16844 read=40581
                                I/O Timings: read=159.558
                                -> Parallel Seq Scan on public.aken01 (cost=0.00..530993.00 rows=1562500 width=0) (actual time=0.032..3845.940 rows=5555556 loops=9)
                                Buffers: shared hit=155144 read=360224
                                I/O Timings: read=1641.442
                                Worker 0: actual time=0.032..3666.180 rows=5557782 loops=1
                                Buffers: shared hit=17161 read=40125
                                I/O Timings: read=158.571
                                Worker 1: actual time=0.032..3601.889 rows=5475027 loops=1
                                Buffers: shared hit=16900 read=39533
                                I/O Timings: read=177.511
                                Worker 2: actual time=0.031..4047.629 rows=5646302 loops=1
                                Buffers: shared hit=17266 read=40933
                                I/O Timings: read=93.361
                                Worker 3: actual time=0.041..3998.200 rows=5454196 loops=1
                                Buffers: shared hit=17111 read=39107
                                I/O Timings: read=247.815
                                Worker 4: actual time=0.040..3864.664 rows=5500164 loops=1
                                Buffers: shared hit=17405 read=39287
                                I/O Timings: read=90.894
                                Worker 5: actual time=0.034..3586.671 rows=5512289 loops=1
                                Buffers: shared hit=17199 read=39618
                                I/O Timings: read=245.592
                                Worker 6: actual time=0.027..3921.528 rows=5693970 loops=1
                                Buffers: shared hit=17273 read=41417
                                I/O Timings: read=219.668
                                Worker 7: actual time=0.026..4269.655 rows=5571265 loops=1
                                Buffers: shared hit=16844 read=40581
                                I/O Timings: read=159.558
                                Planning time: 0.191 ms
                                Execution time: 6160.723 ms
                                (67 rows)
                                akendb=#
                                akendb=#select count(*) from aken01;
                                count
                                ----------
                                50000001
                                (1 row)
                                Time: 4447.033 ms (00:04.447)
                                akendb=#

                                使用并行本质上是通过资源消耗替代时间消耗,在资源充足的情况下对SQL性能能起到提升作用,但资源不充足情况下,过大的并行度会加剧CPU的消耗,SQL效率则有可能变得更差。


                                I Love PG

                                关于我们

                                PostgreSQLPG2017PostgreSQLPG非盈利行业协会组织。我们致力于在中国PostgreSQLPostgreSQL


                                欢迎投稿

                                做你的舞台,show出自己的才华 。

                                投稿邮箱:partner@postgresqlchina.com

                                                    

                                                    ——愿能安放你不羁的灵魂


                                技术文章精彩回顾




                                PostgreSQL学习的九层宝塔
                                PostgreSQL职业发展与学习攻略
                                2019,年度数据库舍 PostgreSQL 其谁?
                                Postgres是最好的开源软件
                                PostgreSQL是世界上最好的数据库
                                从Oracle迁移到PostgreSQL的十大理由
                                从“非主流”到“潮流”,开源早已值得拥有

                                PG活动精彩回顾




                                创建PG全球生态!PostgresConf.CN2019大会盛大召开
                                首站起航!2019“让PG‘象’前行”上海站成功举行
                                走进蓉城丨2019“让PG‘象’前行”成都站成功举行
                                中国PG象牙塔计划发布,首批合作高校授牌仪式在天津举行
                                群英论道聚北京,共话PostgreSQL
                                相聚巴厘岛| PG Conf.Asia 2019  DAY0、DAY1简报
                                相知巴厘岛| PG Conf.Asia 2019 DAY2简报
                                独家|硅谷Postgres大会简报
                                直播回顾 | Bruce Momjian:原生分布式将在PG 14版本发布

                                PG培训认证精彩回顾




                                中国首批PGCA认证考试圆满结束,203位考生成功获得认证!
                                中国第二批PGCA认证考试圆满结束,115位考生喜获认证!
                                重要通知:三方共建,中国PostgreSQL认证权威升级!
                                近500人参与!首次PGCE中级、第三批次PGCA初级认证考试落幕!
                                2020年首批 | 中国PostgreSQL初级认证考试圆满结束
                                一分耕耘一分收获,第五批次PostgreSQL认证考试成绩公布

                                文章转载自开源软件联盟PostgreSQL分会,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                评论