DBA,数据库开发工程师,对SQL优化感兴趣的同学
巡检中发现SQL:
select count(*) as records from (
select distinct res.* from power_bpm.pt_wf_ru_task res
left join power_bpm.pt_wf_ru_identitylink i on i.task_id = res.id
where ( res.assignee is null and (i.user_id = $1 ))
union all
select distinct res.* from power_bpm.pt_wf_ru_task res where res.assignee = $2
order by create_time desc
) as t
执行情况 |query_id|执行时间|次数|耗时| |-|-|-|-| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-08:58:25.6|5|7524.379| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:01:25.6|3|7463.6| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:02:11.7|1|2563.457| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:04:11.2|4|2571.467| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:11:25.9|9|2612.143| |e4cfd70281fff9552f70cbdba248935e|2022-07-14-09:12:34.5|6|2655.798|
综合情况为302次/天,平均2579ms/次,占总成本近10%。
1.冗余排序 从SQL语义上面看子查询中主体是计算总量,而order by 对计算总量影响不大
2.外连接转换问题 第一个子查询 外连接的表加过滤条件,相当于“内连接”,
3.是否有必要distinct
查看关联数据
select count(1), id from power_bpm.pt_wf_ru_task i group by id order by 1 desc ;
Count id
1 00005a03b8bf11ecb372bea9af21a35e
1 0000e958b8d411ecb372bea9af21a35e
1 000168dab88b11ecb372bea9af21a35e
1 000228439acc11eca2ed9a2b1b810500
1 00025a23d64e11ecb5289ef3c5ed4956
---id也是 pt_wf_ru_task表的主键
select count(1),task_id from power_bpm.pt_wf_ru_identitylink res group by task_id order by 1 desc ;
976839
88 360d32e008b911edad22d2d4b3bad39a
88 5c662c48073e11edaf73d2d4b3bad39a
87 ce3051c4072911edaf73d2d4b3bad39a
78 a9674f47073e11edaf73d2d4b3bad39a
id也是 pt_wf_ru_task表的主键,也不会重复所以整体SQL可以等价改写为
select count(1) from (
select res.id from power_bpm.pt_wf_ru_task res
where res.assignee IS NULL
and exists ( select 1 from power_bpm.pt_wf_ru_identitylink i
where i.task_id = res.ID and i.user_id = 'XX' )
union
select id FROM power_bpm.pt_wf_ru_task res
WHERE res.assignee = 'XX'
) t ;
查询log得知,$1=$2 = 2200000100005800 约70%。 对应该表数据量严重倾斜
select count(1), user_id from power_bpm.pt_wf_ru_identitylink i group by user_id order by 1 desc;
299913 2200000100005800
296744 2200000100005428
296261 2200000100014286
97169 2200000100014192
59361 2200000100016976
59361 2200000100010363
29635 2200000100009721
21487 2200000100009728
16890 2200000100009735
13687 2200000100002044
.....
214 2200000100209728
168 2200000100209734
137 2200000101102045
当$1=$2 = 2200000100005800时 耗时2000ms 执行计划:
Aggregate (cost=166815.49..166815.50 rows=1 width=8) (actual time=1978.936..1981.453 rows=1 loops=1)
Buffers: shared hit=49715, temp read=13420 written=13430
-> Sort (cost=164554.62..164931.43 rows=150725 width=395) (actual time=1912.704..1971.422 rows=149865 loops=1)
Sort Key: res.create_time DESC
Sort Method: external merge Disk: 53680kB
Buffers: shared hit=49715, temp read=13420 written=13430
-> Append (cost=114478.52..124284.09 rows=150725 width=395) (actual time=1480.204..1759.242 rows=149865 loops=1)
Buffers: shared hit=49715, temp read=6710 written=6715