暂无图片
分享
风云飘
2022-12-09
postgresql为什么没用到分区裁剪

pg是12.7的版本
表结构(每个分区大概60万数据):

\d test_tab
                         Partitioned table "etl.test_tab"
        Column         |            Type             | Collation | Nullable | Default 
-----------------------+-----------------------------+-----------+----------+---------
 custnum               | character varying(100)      |           |          | 
 custname              | character varying(300)      |           |          | 
 custmanager           | character varying(100)      |           |          | 
 branch                | character varying(40)       |           |          | 
 datamonth             | timestamp without time zone |           |          | 
 gridbuildingnamemajor | character varying(100)      |           |          | 
 gridbuildingnamesmall | character varying(100)      |           |          | 
 gridbuildingname      | character varying(100)      |           |          | 
 industry              | character varying(100)      |           |          | 
 hj_bj                 | numeric                     |           |          | 
 tongzhizhuangji       | numeric                     |           |          | 
 zhongduan_count       | numeric                     |           |          | 
 ll_month              | numeric                     |           |          | 
Partition key: RANGE (datamonth)
Indexes:
    "idx_test_tab01" btree (branch)
    "idx_test_tab02" btree (custmanager)
    "idx_test_tab03" btree (gridbuildingnamemajor)
    "idx_test_tab04" btree (gridbuildingnamesmall)
    "idx_test_tab05" btree (gridbuildingname)
    "idx_test_tab06" btree (custnum)
    "idx_test_tab07" btree (custname, gridbuildingname)
    "idx_test_tab08" btree (datamonth)
Number of partitions: 87 (Use \d+ to list them.)

执行查询:

select count(*) from test_tab
where datamonth =(select max(datamonth) from test_tab);
执行计划为(中间类似的分区信息):

explain select count(*) from test_tab
where datamonth =(select max(datamonth) from test_tab);
                                                                                         QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1005183.85..1005183.86 rows=1 width=8)
   InitPlan 2 (returns $1)
     ->  Result  (cost=24.96..24.97 rows=1 width=8)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=24.90..24.96 rows=1 width=8)
                   ->  Append  (cost=24.90..1699080.58 rows=31932208 width=8)
                         ->  Index Only Scan Backward using test_tab_p_202303_datamonth_idx on test_tab_p_202303 test_tab_p_202303_1  (cost=0.14..64.66 rows=30 width=8)
                               Index Cond: (datamonth IS NOT NULL)
                         ->  Index Only Scan Backward using test_tab_p_202302_datamonth_idx on test_tab_p_202302 test_tab_p_202302_1  (cost=0.14..64.66 rows=30 width=8)
                               Index Cond: (datamonth IS NOT NULL)
                         (省略类似分区信息............)
                         ->  Index Only Scan Backward using test_tab_p_201601_datamonth_idx on test_tab_p_201601 test_tab_p_201601_1  (cost=0.14..64.66 rows=30 width=8)
                               Index Cond: (datamonth IS NOT NULL)
   ->  Gather  (cost=1005158.67..1005158.88 rows=2 width=8)
         Workers Planned: 2
         Params Evaluated: $1
         ->  Partial Aggregate  (cost=1004158.67..1004158.68 rows=1 width=8)
               ->  Parallel Append  (cost=0.42..970897.16 rows=13304601 width=0)
                     ->  Parallel Index Only Scan using test_tab_p_202205_datamonth_idx on test_tab_p_202205  (cost=0.43..28487.90 rows=523838 width=0)
                           Index Cond: (datamonth = $1)
                     ->  Parallel Index Only Scan using test_tab_p_202210_datamonth_idx on test_tab_p_202210  (cost=0.42..15042.56 rows=276505 width=0)
                           Index Cond: (datamonth = $1)
                     (省略类似分区信息............)      
                     ->  Parallel Index Only Scan using test_tab_p_202302_datamonth_idx on test_tab_p_202302  (cost=0.14..8.15 rows=1 width=0)
                           Index Cond: (datamonth = $1)
                     ->  Parallel Index Only Scan using test_tab_p_202303_datamonth_idx on test_tab_p_202303  (cost=0.14..8.15 rows=1 width=0)
                           Index Cond: (datamonth = $1)
                     ->  Parallel Seq Scan on test_tab_p_201904  (cost=0.00..23450.30 rows=331464 width=0)
                           Filter: (datamonth = $1)
                     (省略类似分区信息............)
                           Filter: (datamonth = $1)
                     ->  Parallel Seq Scan on test_tab_p_202108  (cost=0.00..17988.96 rows=247277 width=0)
                           Filter: (datamonth = $1)
 JIT:
   Functions: 304
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(362 rows)

好像没用到分区裁剪,所有分区都扫描了一遍?
如果用类似查询:select count(*) from test_tab where datamonth =date'2022-10-01';
将后面max部分的查询直接用具体日期代入,可以瞬间出结果。
直接查
select max(datamonth) from test_tab也可以瞬间出结果。

两个组合一起就要6秒多。

pg好像无法建全局索引,max部分要全部扫描所有分区索引可以理解,但主表好像是不是也没用到分区的时间裁剪,全部扫描了一遍?


收藏
分享
1条回答
默认
最新
风云飘
问题已关闭:
暂无图片 评论
暂无图片 有用 0
回答交流
提交
问题信息
请登录之后查看
邀请回答
暂无人订阅该标签,敬请期待~~
暂无图片墨值悬赏