暂无图片
SQL优化案例-外连接优化
最近更新:2022-08-04 00:54:08

适用范围

DBA,数据库开发工程师,对SQL优化感兴趣的同学

问题概述

巡检中发现SQL:

  select count(*) as records from (
select distinct res.* from power_bpm.pt_wf_ru_task res 
left join power_bpm.pt_wf_ru_identitylink i on i.task_id = res.id
 where ( res.assignee is null and (i.user_id = $1 ))
 union all 
select distinct res.* from power_bpm.pt_wf_ru_task res where res.assignee = $2
 order by create_time desc
) as t

执行情况 |query_id|执行时间|次数|耗时| |-|-|-|-| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-08:58:25.6|5|7524.379| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:01:25.6|3|7463.6| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:02:11.7|1|2563.457| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:04:11.2|4|2571.467| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:11:25.9|9|2612.143| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:12:34.5|6|2655.798|

综合情况为302次/天,平均2579ms/次,占总成本近10%。

问题原因

  • 1.冗余排序 从SQL语义上面看子查询中主体是计算总量,而order by 对计算总量影响不大

  • 2.外连接转换问题 第一个子查询 外连接的表加过滤条件,相当于“内连接”,

  • 3.是否有必要distinct

查看关联数据

  select count(1), id  from power_bpm.pt_wf_ru_task i  group by   id  order by 1 desc ;
Count  id
1	00005a03b8bf11ecb372bea9af21a35e
1	0000e958b8d411ecb372bea9af21a35e
1	000168dab88b11ecb372bea9af21a35e
1	000228439acc11eca2ed9a2b1b810500
1	00025a23d64e11ecb5289ef3c5ed4956
---id也是 pt_wf_ru_task表的主键
  select count(1),task_id  from  power_bpm.pt_wf_ru_identitylink res group by   task_id order by 1 desc ;
976839	
88	360d32e008b911edad22d2d4b3bad39a
88	5c662c48073e11edaf73d2d4b3bad39a
87	ce3051c4072911edaf73d2d4b3bad39a
78	a9674f47073e11edaf73d2d4b3bad39a

解决方案

id也是 pt_wf_ru_task表的主键,也不会重复所以整体SQL可以等价改写为

  select count(1) from (
	select  res.id  from  power_bpm.pt_wf_ru_task res 
	where res.assignee IS NULL 
	and exists ( select  1 from   power_bpm.pt_wf_ru_identitylink i 
where  i.task_id = res.ID  and   i.user_id = 'XX'   )
	union 
	select   id  FROM  power_bpm.pt_wf_ru_task res 
	WHERE   res.assignee =  'XX'
)  t ;

执行计划分析

查询log得知,$1=$2 = 2200000100005800 约70%。 对应该表数据量严重倾斜

  select  count(1), user_id  from  power_bpm.pt_wf_ru_identitylink i   group by   user_id order by  1 desc;
299913	2200000100005800
296744	2200000100005428
296261	2200000100014286
97169	2200000100014192
59361	2200000100016976
59361	2200000100010363
29635	2200000100009721
21487	2200000100009728
16890	2200000100009735
13687	2200000100002044
.....
214 	2200000100209728
168	    2200000100209734
137	   2200000101102045

当$1=$2 = 2200000100005800时 耗时2000ms 执行计划:

  Aggregate  (cost=166815.49..166815.50 rows=1 width=8) (actual time=1978.936..1981.453 rows=1 loops=1)
  Buffers: shared hit=49715, temp read=13420 written=13430
  ->  Sort  (cost=164554.62..164931.43 rows=150725 width=395) (actual time=1912.704..1971.422 rows=149865 loops=1)
        Sort Key: res.create_time DESC
        Sort Method: external merge  Disk: 53680kB
        Buffers: shared hit=49715, temp read=13420 written=13430
        ->  Append  (cost=114478.52..124284.09 rows=150725 width=395) (actual time=1480.204..1759.242 rows=149865 loops=1)
              Buffers: shared hit=49715, temp read=6710 written=6715
......