SQL优化 -- 使用Filter子句优化SQL的一个案例

原创 KINGBASE研究院 2023-10-24

5446

客户的重要系统有一条报表统计SQL执行非常慢，分析发现SQL写法上存在严重问题。在改写过程中，发现PostgreSQL filter 的写法不但可以完美地替换case .. when 语句，而且可读性非常好。来看具体的SQL。

注：考虑到客户隐私，且为了让SQL更易读，以下使用了简化的SQL。

以下是测试数据：

create table t1(t1id1 integer, t1name text);
create table t2(t2id1 integer, typ1 text, typ2 text);
insert into t1 select generate_series(1,100),md5(random());
insert into t2 select trunc(random()*100),round(random()),round(random()) from generate_series(1,10000000) id;
create index idx_t2 on t2(t2id1);

一、修改前的SQL及执行计划

select t1id1,
       (select count(*) as cntyp1 from t2 where t2id1=t1id1 and typ1='1' and typ2='1' ) as typ11,
       (select count(*) as cntyp1 from t2 where t2id1=t1id1 and typ1='1' and typ2='0' ) as typ10,
       (select count(*) as cntyp1 from t2 where t2id1=t1id1 and typ1='0' and typ2='1' ) as typ01,
       (select count(*) as cntyp2 from t2 where t2id1=t1id1 and typ1='0' and typ2='0' ) as typ00
from t1;

从语句写法就能看出，该SQL对于t2表重复多次访问，且由于采用了select + 子查询的模式，使得子查询只能使用SubPlan 方式，执行效率非常低。来看执行计划：

                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..19170080.00 rows=100 width=36) (actual time=721.117..70452.671 rows=100 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=47924.98..47924.99 rows=1 width=8) (actual time=175.795..175.796 rows=1 loops=100)
           ->  Bitmap Heap Scan on t2  (cost=1864.67..47862.67 rows=24927 width=0) (actual time=16.483..173.279 rows=24758 loops=100)
                 Recheck Cond: (t2id1 = t1.t1id1)
                 Filter: ((typ1 = '1'::text) AND (typ2 = '1'::text))
                 Rows Removed by Filter: 74239
                 Heap Blocks: exact=3928953
                 ->  Bitmap Index Scan on idx_t2  (cost=0.00..1858.43 rows=100000 width=0) (actual time=10.018..10.018 rows=98996 loops=100)
                       Index Cond: (t2id1 = t1.t1id1)
   SubPlan 2
     ->  Aggregate  (cost=47924.88..47924.89 rows=1 width=8) (actual time=177.447..177.447 rows=1 loops=100)
           ->  Bitmap Heap Scan on t2 t2_1  (cost=1864.66..47862.66 rows=24890 width=0) (actual time=16.542..174.869 rows=24759 loops=100)
                 Recheck Cond: (t2id1 = t1.t1id1)
                 Filter: ((typ1 = '1'::text) AND (typ2 = '0'::text))
                 Rows Removed by Filter: 74238
                 Heap Blocks: exact=3928953
                 ->  Bitmap Index Scan on idx_t2  (cost=0.00..1858.43 rows=100000 width=0) (actual time=10.067..10.067 rows=98996 loops=100)
                       Index Cond: (t2id1 = t1.t1id1)
   SubPlan 3
     ->  Aggregate  (cost=47925.49..47925.50 rows=1 width=8) (actual time=174.072..174.072 rows=1 loops=100)
           ->  Bitmap Heap Scan on t2 t2_2  (cost=1864.71..47862.71 rows=25110 width=0) (actual time=16.139..171.627 rows=24729 loops=100)
                 Recheck Cond: (t2id1 = t1.t1id1)
                 Filter: ((typ1 = '0'::text) AND (typ2 = '1'::text))
                 Rows Removed by Filter: 74267
                 Heap Blocks: exact=3928953
                 ->  Bitmap Index Scan on idx_t2  (cost=0.00..1858.43 rows=100000 width=0) (actual time=9.847..9.847 rows=98996 loops=100)
                       Index Cond: (t2id1 = t1.t1id1)
   SubPlan 4
     ->  Aggregate  (cost=47925.39..47925.40 rows=1 width=8) (actual time=177.180..177.180 rows=1 loops=100)
           ->  Bitmap Heap Scan on t2 t2_3  (cost=1864.70..47862.70 rows=25073 width=0) (actual time=16.470..174.678 rows=24751 loops=100)
                 Recheck Cond: (t2id1 = t1.t1id1)
                 Filter: ((typ1 = '0'::text) AND (typ2 = '0'::text))
                 Rows Removed by Filter: 74245
                 Heap Blocks: exact=3928953
                 ->  Bitmap Index Scan on idx_t2  (cost=0.00..1858.43 rows=100000 width=0) (actual time=9.925..9.925 rows=98996 loops=100)
                       Index Cond: (t2id1 = t1.t1id1)
 Planning Time: 0.163 ms
 Execution Time: 70452.769 ms

从执行计划能看到SQL的每个subplan大约消耗了17s，4个subplan共消耗了70s 。很明显，优化的方向是减少t2表的重复访问，同时去除subplan。

除了case .. when，本例也可以用group by，但可读性及灵活性均不如filter。不同的Filter 可以设置不同的条件，统计的方法不限于count。

二、case ... when 改写

select t1id1, typ11,typ10,typ01,typ00 from t1 left join
(select t2id1, 
       sum(case when typ1='1' and typ2='1' then 1 end) as typ11,
       sum(case when typ1='1' and typ2='0' then 1 end) as typ10,
       sum(case when typ1='0' and typ2='1' then 1 end) as typ01,
       sum(case when typ1='0' and typ2='0' then 1 end) as typ00
from t2 group by t2id1) t2 on t2id1=t1id1

执行计划如下：

                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=469250.27..469252.55 rows=100 width=36) (actual time=4714.313..4714.337 rows=100 loops=1)
   Hash Cond: (t1.t1id1 = t2.t2id1)
   ->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.008..0.014 rows=100 loops=1)
   ->  Hash  (cost=469249.02..469249.02 rows=100 width=36) (actual time=4714.299..4714.299 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 15kB
         ->  Subquery Scan on t2  (cost=469247.02..469249.02 rows=100 width=36) (actual time=4714.257..4714.278 rows=100 loops=1)
               ->  HashAggregate  (cost=469247.02..469248.02 rows=100 width=36) (actual time=4714.256..4714.269 rows=100 loops=1)
                     Group Key: t2_1.t2id1
                     ->  Seq Scan on t2 t2_1  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.364..915.072 rows=10000000 loops=1)
 Planning Time: 0.101 ms
 Execution Time: 4714.383 ms

三、filter 改写

针对以上场景，很容易就想到case .. when 语句，但考虑到相对逻辑复杂，且可读性差，这里使用filter 写法。具体修改如下：

select t1id1,cntyp11,cntyp10,cntyp01,cntyp00 from t1 left join (
   select t2id1,
       count(*) filter (where typ1='1' and typ2='1') as cntyp11,
       count(*) filter (where typ1='1' and typ2='0') as cntyp10,
       count(*) filter (where typ1='0' and typ2='1') as cntyp01,
       count(*) filter (where typ1='0' and typ2='0') as cntyp00
   from t2 group by t2id1 ) t2 on t1id1=t2id1;

执行计划如下：

                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=469250.27..469252.55 rows=100 width=36) (actual time=3835.643..3835.665 rows=100 loops=1)
   Hash Cond: (t1.t1id1 = t2.t2id1)
   ->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.011..0.016 rows=100 loops=1)
   ->  Hash  (cost=469249.02..469249.02 rows=100 width=36) (actual time=3835.627..3835.627 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 15kB
         ->  Subquery Scan on t2  (cost=469247.02..469249.02 rows=100 width=36) (actual time=3835.588..3835.608 rows=100 loops=1)
               ->  HashAggregate  (cost=469247.02..469248.02 rows=100 width=36) (actual time=3835.587..3835.600 rows=100 loops=1)
                     Group Key: t2_1.t2id1
                     ->  Seq Scan on t2 t2_1  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.321..742.129 rows=10000000 loops=1)