作者
digoal
日期
2018-11-24
标签
PostgreSQL , join 顺序 , 穷举 , 遗传算法 , 图式算法 , JOIN层级 , 语义 , 广度优先 , 深度优先 , QUERY REWRITE
背景
1、直接修改SQL,使用强制JOIN顺序(join_collapse_limit=1)
https://www.postgresql.org/docs/11/runtime-config-query.html
2、改参数
``` from_collapse_limit=8
join_collapse_limit=8 ```
前8个表,可以穷举(找最佳JOIN顺序),8个表后面如果还有表,按书写顺序JOIN。
如果JOIN的表超过geqo_threshold,则不使用穷举法(包括前面8个也不使用),直接使用GEQO图式算法,找最佳顺序。
https://www.postgresql.org/docs/11/geqo.html
3、加HINT,强制JOIN顺序。
《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》
4、使用SR_PLAN,改变SQL的执行计划缓存。
《PostgreSQL Oracle 兼容性之 - SQL OUTLINE插件sr_plan (保存、篡改、固定 执行计划)》
https://github.com/postgrespro/sr_plan
5、改内核,QUERY REWRITE,调整SQL JOIN顺序。找合理ROOT节点,按层级逐层书写。例如使用规则法:带where过滤的表JOIN提前(不改变语义的前提)。
6、改内核,geqo算法优化,选ROOT,按层级调整JOIN顺序(类似广度优先,而非深度优先)。
``` postgres=# show from_collapse_limit ; from_collapse_limit
8 (1 row)
postgres=# show join_collapse_limit ; join_collapse_limit
8 (1 row)
postgres=# show geqo
geqo geqo_effort
geqo_generations geqo_pool_size
geqo_seed geqo_selection_bias
geqo_threshold
```
case
explain (analyze)
select
-- 所有字段
from xx_order_item this_ -- 过滤到 100多万行
inner join xx_order order1_ on this_.orders= order1_.id
left outer join test123 digoal6_ on order1_.party_contact_mech_id= digoal6_.id
left outer join cmn_contact_mech contactmec7_ on digoal6_.contract_mech_id= contactmec7_.id
left outer join test_321 geo8_ on contactmec7_.city_id= geo8_.id
left outer join test_321 geo9_ on geo8_.parent_id= geo9_.id
left outer join test_321 geo10_ on contactmec7_.province_id= geo10_.id
left outer join test_abc shipment11_ on digoal6_.shipment_id= shipment11_.id
left outer join test_abc123 party12_ on shipment11_.owner_id= party12_.id
left outer join test_abc321 dealerstat13_ on party12_.id= dealerstat13_.party_id
left outer join test_abcdef123 dealer14_ on dealerstat13_.dealer_id= dealer14_.id
left outer join test_111 dictionary15_ on dealerstat13_.tax_rate_id= dictionary15_.id
left outer join test_abc123_invoice_mech digoal12i16_ on order1_.party_invoice_mech_id= digoal12i16_.id
left outer join test_111222 invoiceinf17_ on digoal12i16_.invoice_info_id= invoiceinf17_.id
left outer join test_abc123 party18_ on invoiceinf17_.owner_id= party18_.id
left outer join test123 digoal19_ on digoal12i16_.invoice_send_contact_mech_id= digoal19_.id
left outer join test_abc123 party20_ on digoal12i16_.party_id= party20_.id
inner join test111_111222 order_stor2_ on order1_.store_id= order_stor2_.id -- 过滤到 100多万行
left outer join test_321 geo22_ on order_stor2_.city_id= geo22_.id
left outer join test_321 geo23_ on order_stor2_.district_id= geo23_.id
left outer join test111_111222_group productsto24_ on order_stor2_.mgt_product_store_group_id= productsto24_.id
left outer join abcde productpri25_ on productsto24_.product_price_mech_group_id= productpri25_.id
left outer join abcde123 membercard26_ on productsto24_.purchase_member_card_id= membercard26_.id
left outer join abcde123321 member27_ on membercard26_.member_id= member27_.id
left outer join test_a123 memberrank28_ on member27_.member_rank_id= memberrank28_.id
left outer join test_abc123 party29_ on member27_.party_id= party29_.id
left outer join test_abc123 person30_ on membercard26_.owner_id= person30_.id
left outer join test_b123 settlement31_ on membercard26_.id= settlement31_.owner_id
left outer join test_abc123 partygroup32_ on order_stor2_.owner_id= partygroup32_.id
inner join test111_111222_group order_stor3_ on order_stor2_.parent_id= order_stor3_.id -- JOIN后 过滤到40条
left outer join abcde productpri34_ on order_stor2_.product_price_mech_group_id= productpri34_.id
left outer join test_321 geo35_ on order_stor2_.province_id= geo35_.id
left outer join test_b12311 quot36_ on order1_.id= quot36_.order_id
left outer join test_c12311 product37_ on this_.product_id= product37_.id
left outer join test_cde123 goods38_ on product37_.goods_id= goods38_.id
left outer join test_cde123_group goodsgroup39_ on goods38_.goods_group_id= goodsgroup39_.id
left outer join test_caef1 brand40_ on goodsgroup39_.brand_id= brand40_.id
left outer join test_c12311_category productcat41_ on goodsgroup39_.product_category_id= productcat41_.id
where( this_.leafletable='Y' and order_stor3_.id=90 and this_.shipmentable='Y' and this_.state='NORMAL' ) order by this_.id desc limit 100 ;
这三个INNERT JOIN最先执行,可以过滤到40行。
但是如果优化器没有把inner join提前去JOIN,则会导致循环100多万次,导致性能问题。(即使merge, hash join,性能依旧差。)
最重要的是将inner join提前。
复现方法
``` drop table if exists t1; drop table if exists t2; create table t1 (c1 int primary key, c2 int, c3 int, info text); create table t2 (c1 int primary key, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int, c9 int, info text); create index idx_t1_1 on t1 (c2); create index idx_t1_2 on t1 (c3); create index idx_t2_1 on t2 (c2); create index idx_t2_2 on t2 (c3); create index idx_t2_3 on t2 (c4); create index idx_t2_4 on t2 (c5); create index idx_t2_5 on t2 (c6); create index idx_t2_6 on t2 (c7); create index idx_t2_7 on t2 (c8); create index idx_t2_8 on t2 (c9);
do language plpgsql $$ declare begin for i in 3..48 loop execute 'drop table if exists t'||i; execute 'create table t'||i||' (like t2 including all)'; end loop; end; $$;
insert into t1 select id, id, id, 'test' from generate_series(1,2000000) t(id); insert into t2 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,2000000) t(id);
-- 如果t3的值范围在INDEX的前面,使用MERGE JOIN会有很明显的加速效果 -- insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,200) t(id);
-- 如果t3的值范围在INDEX的末尾,使用MERGE JOIN时,前面的依旧需要循环多次
-- 原因如下:
-- 《PostgreSQL merge join 扫描方法实例细说,以及SQL写法注意 或 内核优化建议》
insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1999801,2000000) t(id);
do language plpgsql $$ declare begin for i in 4..48 loop execute 'insert into t'||i||' select id, id, id, id, id, id, id, id, id, ''test'' from generate_series(1,10000) t(id)'; end loop; for i in 1..48 loop execute 'analyze t'||i; end loop; end; $$;
do language plpgsql $$ declare sql text; begin sql := 'select * from t1 join t2 on (t1.c2=t2.c1) left join t4 on (t1.c3=t4.c1)';
for i in 4..47 loop sql := sql||' left join t'||i+1||' on (t'||i||'.c2=t'||i+1||'.c1) '; end loop; sql := sql||' left join t5 ttt5 on (t42.c1=ttt5.c3)'; sql := sql||' left join t5 ttt6 on (ttt5.c4=ttt6.c5)'; sql := sql||' left join t5 ttt7 on (ttt6.c6=ttt7.c7)'; sql := sql||' join t3 on (t2.c2=t3.c1)'; sql := sql||' left join t4 tt4 on (t47.c1=tt4.c3)'; sql := sql||' left join t4 tt5 on (tt4.c4=tt5.c5)'; raise notice '% ;', sql; end; $$;
explain (analyze)
select * from
t1
inner join t2 on (t1.c2=t2.c1)
left join t4 on (t1.c3=t4.c1) left join t5 on (t4.c2=t5.c1) left join t6 on (t5.c2=t6.c1)
left join t7 on (t6.c2=t7.c1) left join t8 on (t7.c2=t8.c1) left join t9 on (t8.c2=t9.c1)
left join t10 on (t9.c2=t10.c1) left join t11 on (t10.c2=t11.c1) left join t12 on (t11.c2=t12.c1)
left join t13 on (t12.c2=t13.c1) left join t14 on (t13.c2=t14.c1) left join t15 on (t14.c2=t15.c1)
left join t16 on (t15.c2=t16.c1) left join t17 on (t16.c2=t17.c1) left join t18 on (t17.c2=t18.c1)
left join t19 on (t18.c2=t19.c1) left join t20 on (t19.c2=t20.c1) left join t21 on (t20.c2=t21.c1)
left join t22 on (t21.c2=t22.c1) left join t23 on (t22.c2=t23.c1) left join t24 on (t23.c2=t24.c1)
left join t25 on (t24.c2=t25.c1) left join t26 on (t25.c2=t26.c1) left join t27 on (t26.c2=t27.c1)
left join t28 on (t27.c2=t28.c1) left join t29 on (t28.c2=t29.c1) left join t30 on (t29.c2=t30.c1)
left join t31 on (t30.c2=t31.c1) left join t32 on (t31.c2=t32.c1) left join t33 on (t32.c2=t33.c1)
left join t34 on (t33.c2=t34.c1) left join t35 on (t34.c2=t35.c1) left join t36 on (t35.c2=t36.c1)
left join t37 on (t36.c2=t37.c1) left join t38 on (t37.c2=t38.c1) left join t39 on (t38.c2=t39.c1)
left join t3 ttttt3 on (ttttt3.c6=t33.c5) -- 加一行
left join t40 on (t39.c2=t40.c1) left join t41 on (t40.c2=t41.c1) left join t42 on (t41.c2=t42.c1)
left join t43 on (t42.c2=t43.c1) left join t44 on (t43.c2=t44.c1) left join t45 on (t44.c2=t45.c1)
left join t46 on (t45.c2=t46.c1) left join t47 on (t46.c2=t47.c1) left join t48 on (t47.c2=t48.c1)
left join t5 ttt5 on (t42.c1=ttt5.c3) left join t5 ttt6 on (ttt5.c4=ttt6.c5) left join t5 ttt7 on (ttt6.c6=ttt7.c7)
inner join t3 on (t2.c2=t3.c1)
left join t4 tt4 on (t47.c1=tt4.c3) left join t4 tt5 on (tt4.c4=tt5.c5)
;
-- 约12秒
-- 调整t3顺序,提前,性能飙升。 -- 约 20毫秒
explain (analyze)
select * from
t1
inner join t2 on (t1.c2=t2.c1)
inner join t3 on (t2.c2=t3.c1) -- 把t3 提前,过滤性提高,性能提升。
left join t4 on (t1.c3=t4.c1) left join t5 on (t4.c2=t5.c1) left join t6 on (t5.c2=t6.c1)
left join t7 on (t6.c2=t7.c1) left join t8 on (t7.c2=t8.c1) left join t9 on (t8.c2=t9.c1)
left join t10 on (t9.c2=t10.c1) left join t11 on (t10.c2=t11.c1) left join t12 on (t11.c2=t12.c1)
left join t13 on (t12.c2=t13.c1) left join t14 on (t13.c2=t14.c1) left join t15 on (t14.c2=t15.c1)
left join t16 on (t15.c2=t16.c1) left join t17 on (t16.c2=t17.c1) left join t18 on (t17.c2=t18.c1)
left join t19 on (t18.c2=t19.c1) left join t20 on (t19.c2=t20.c1) left join t21 on (t20.c2=t21.c1)
left join t22 on (t21.c2=t22.c1) left join t23 on (t22.c2=t23.c1) left join t24 on (t23.c2=t24.c1)
left join t25 on (t24.c2=t25.c1) left join t26 on (t25.c2=t26.c1) left join t27 on (t26.c2=t27.c1)
left join t28 on (t27.c2=t28.c1) left join t29 on (t28.c2=t29.c1) left join t30 on (t29.c2=t30.c1)
left join t31 on (t30.c2=t31.c1) left join t32 on (t31.c2=t32.c1) left join t33 on (t32.c2=t33.c1)
left join t34 on (t33.c2=t34.c1) left join t35 on (t34.c2=t35.c1) left join t36 on (t35.c2=t36.c1)
left join t37 on (t36.c2=t37.c1) left join t38 on (t37.c2=t38.c1) left join t39 on (t38.c2=t39.c1)
left join t3 ttttt3 on (ttttt3.c6=t33.c5) -- 加一行
left join t40 on (t39.c2=t40.c1) left join t41 on (t40.c2=t41.c1) left join t42 on (t41.c2=t42.c1)
left join t43 on (t42.c2=t43.c1) left join t44 on (t43.c2=t44.c1) left join t45 on (t44.c2=t45.c1)
left join t46 on (t45.c2=t46.c1) left join t47 on (t46.c2=t47.c1) left join t48 on (t47.c2=t48.c1)
left join t5 ttt5 on (t42.c1=ttt5.c3) left join t5 ttt6 on (ttt5.c4=ttt6.c5) left join t5 ttt7 on (ttt6.c6=ttt7.c7)
left join t4 tt4 on (t47.c1=tt4.c3) left join t4 tt5 on (tt4.c4=tt5.c5)
;
```
最后,可以尝试aqo优化器。
https://github.com/postgrespro/aqo
小结
如果t3的值范围在INDEX的前面,使用MERGE JOIN会有很明显的加速效果
insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,200) t(id);
如果t3的值范围在INDEX的末尾,使用MERGE JOIN时,前面的依旧需要循环多次
原因如下:
《PostgreSQL merge join 扫描方法实例细说,以及SQL写法注意 或 内核优化建议》
insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1999801,2000000) t(id);
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





