暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

[未完待续] PostgreSQL 超级多表JOIN优化 - 优化器优化

digoal 2018-11-24
1769

作者

digoal

日期

2018-11-24

标签

PostgreSQL , join 顺序 , 穷举 , 遗传算法 , 图式算法 , JOIN层级 , 语义 , 广度优先 , 深度优先 , QUERY REWRITE


背景

1、直接修改SQL,使用强制JOIN顺序(join_collapse_limit=1)

https://www.postgresql.org/docs/11/runtime-config-query.html

2、改参数

``` from_collapse_limit=8

join_collapse_limit=8 ```

前8个表,可以穷举(找最佳JOIN顺序),8个表后面如果还有表,按书写顺序JOIN。

如果JOIN的表超过geqo_threshold,则不使用穷举法(包括前面8个也不使用),直接使用GEQO图式算法,找最佳顺序。

https://www.postgresql.org/docs/11/geqo.html

3、加HINT,强制JOIN顺序。

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

4、使用SR_PLAN,改变SQL的执行计划缓存。

《PostgreSQL Oracle 兼容性之 - SQL OUTLINE插件sr_plan (保存、篡改、固定 执行计划)》

https://github.com/postgrespro/sr_plan

5、改内核,QUERY REWRITE,调整SQL JOIN顺序。找合理ROOT节点,按层级逐层书写。例如使用规则法:带where过滤的表JOIN提前(不改变语义的前提)。

6、改内核,geqo算法优化,选ROOT,按层级调整JOIN顺序(类似广度优先,而非深度优先)。

``` postgres=# show from_collapse_limit ; from_collapse_limit


8 (1 row)

postgres=# show join_collapse_limit ; join_collapse_limit


8 (1 row)

postgres=# show geqo geqo geqo_effort
geqo_generations geqo_pool_size
geqo_seed geqo_selection_bias
geqo_threshold ```

case

explain (analyze) select -- 所有字段 from xx_order_item this_ -- 过滤到 100多万行 inner join xx_order order1_ on this_.orders= order1_.id left outer join test123 digoal6_ on order1_.party_contact_mech_id= digoal6_.id left outer join cmn_contact_mech contactmec7_ on digoal6_.contract_mech_id= contactmec7_.id left outer join test_321 geo8_ on contactmec7_.city_id= geo8_.id left outer join test_321 geo9_ on geo8_.parent_id= geo9_.id left outer join test_321 geo10_ on contactmec7_.province_id= geo10_.id left outer join test_abc shipment11_ on digoal6_.shipment_id= shipment11_.id left outer join test_abc123 party12_ on shipment11_.owner_id= party12_.id left outer join test_abc321 dealerstat13_ on party12_.id= dealerstat13_.party_id left outer join test_abcdef123 dealer14_ on dealerstat13_.dealer_id= dealer14_.id left outer join test_111 dictionary15_ on dealerstat13_.tax_rate_id= dictionary15_.id left outer join test_abc123_invoice_mech digoal12i16_ on order1_.party_invoice_mech_id= digoal12i16_.id left outer join test_111222 invoiceinf17_ on digoal12i16_.invoice_info_id= invoiceinf17_.id left outer join test_abc123 party18_ on invoiceinf17_.owner_id= party18_.id left outer join test123 digoal19_ on digoal12i16_.invoice_send_contact_mech_id= digoal19_.id left outer join test_abc123 party20_ on digoal12i16_.party_id= party20_.id inner join test111_111222 order_stor2_ on order1_.store_id= order_stor2_.id -- 过滤到 100多万行 left outer join test_321 geo22_ on order_stor2_.city_id= geo22_.id left outer join test_321 geo23_ on order_stor2_.district_id= geo23_.id left outer join test111_111222_group productsto24_ on order_stor2_.mgt_product_store_group_id= productsto24_.id left outer join abcde productpri25_ on productsto24_.product_price_mech_group_id= productpri25_.id left outer join abcde123 membercard26_ on productsto24_.purchase_member_card_id= membercard26_.id left outer join abcde123321 member27_ on membercard26_.member_id= member27_.id left outer join test_a123 memberrank28_ on member27_.member_rank_id= memberrank28_.id left outer join test_abc123 party29_ on member27_.party_id= party29_.id left outer join test_abc123 person30_ on membercard26_.owner_id= person30_.id left outer join test_b123 settlement31_ on membercard26_.id= settlement31_.owner_id left outer join test_abc123 partygroup32_ on order_stor2_.owner_id= partygroup32_.id inner join test111_111222_group order_stor3_ on order_stor2_.parent_id= order_stor3_.id -- JOIN后 过滤到40条 left outer join abcde productpri34_ on order_stor2_.product_price_mech_group_id= productpri34_.id left outer join test_321 geo35_ on order_stor2_.province_id= geo35_.id left outer join test_b12311 quot36_ on order1_.id= quot36_.order_id left outer join test_c12311 product37_ on this_.product_id= product37_.id left outer join test_cde123 goods38_ on product37_.goods_id= goods38_.id left outer join test_cde123_group goodsgroup39_ on goods38_.goods_group_id= goodsgroup39_.id left outer join test_caef1 brand40_ on goodsgroup39_.brand_id= brand40_.id left outer join test_c12311_category productcat41_ on goodsgroup39_.product_category_id= productcat41_.id where( this_.leafletable='Y' and order_stor3_.id=90 and this_.shipmentable='Y' and this_.state='NORMAL' ) order by this_.id desc limit 100 ;

这三个INNERT JOIN最先执行,可以过滤到40行。

但是如果优化器没有把inner join提前去JOIN,则会导致循环100多万次,导致性能问题。(即使merge, hash join,性能依旧差。)

最重要的是将inner join提前。

复现方法

``` drop table if exists t1; drop table if exists t2; create table t1 (c1 int primary key, c2 int, c3 int, info text); create table t2 (c1 int primary key, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int, c9 int, info text); create index idx_t1_1 on t1 (c2); create index idx_t1_2 on t1 (c3); create index idx_t2_1 on t2 (c2); create index idx_t2_2 on t2 (c3); create index idx_t2_3 on t2 (c4); create index idx_t2_4 on t2 (c5); create index idx_t2_5 on t2 (c6); create index idx_t2_6 on t2 (c7); create index idx_t2_7 on t2 (c8); create index idx_t2_8 on t2 (c9);

do language plpgsql $$ declare begin for i in 3..48 loop execute 'drop table if exists t'||i; execute 'create table t'||i||' (like t2 including all)'; end loop; end; $$;

insert into t1 select id, id, id, 'test' from generate_series(1,2000000) t(id); insert into t2 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,2000000) t(id);

-- 如果t3的值范围在INDEX的前面,使用MERGE JOIN会有很明显的加速效果 -- insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,200) t(id);

-- 如果t3的值范围在INDEX的末尾,使用MERGE JOIN时,前面的依旧需要循环多次 -- 原因如下: -- 《PostgreSQL merge join 扫描方法实例细说,以及SQL写法注意 或 内核优化建议》
insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1999801,2000000) t(id);

do language plpgsql $$ declare begin for i in 4..48 loop execute 'insert into t'||i||' select id, id, id, id, id, id, id, id, id, ''test'' from generate_series(1,10000) t(id)'; end loop; for i in 1..48 loop execute 'analyze t'||i; end loop; end; $$;

do language plpgsql $$ declare sql text; begin sql := 'select * from t1 join t2 on (t1.c2=t2.c1) left join t4 on (t1.c3=t4.c1)';

for i in 4..47 loop sql := sql||' left join t'||i+1||' on (t'||i||'.c2=t'||i+1||'.c1) '; end loop; sql := sql||' left join t5 ttt5 on (t42.c1=ttt5.c3)'; sql := sql||' left join t5 ttt6 on (ttt5.c4=ttt6.c5)'; sql := sql||' left join t5 ttt7 on (ttt6.c6=ttt7.c7)'; sql := sql||' join t3 on (t2.c2=t3.c1)'; sql := sql||' left join t4 tt4 on (t47.c1=tt4.c3)'; sql := sql||' left join t4 tt5 on (tt4.c4=tt5.c5)'; raise notice '% ;', sql; end; $$;

explain (analyze) select * from t1 inner join t2 on (t1.c2=t2.c1) left join t4 on (t1.c3=t4.c1) left join t5 on (t4.c2=t5.c1) left join t6 on (t5.c2=t6.c1)
left join t7 on (t6.c2=t7.c1) left join t8 on (t7.c2=t8.c1) left join t9 on (t8.c2=t9.c1)
left join t10 on (t9.c2=t10.c1) left join t11 on (t10.c2=t11.c1) left join t12 on (t11.c2=t12.c1)
left join t13 on (t12.c2=t13.c1) left join t14 on (t13.c2=t14.c1) left join t15 on (t14.c2=t15.c1)
left join t16 on (t15.c2=t16.c1) left join t17 on (t16.c2=t17.c1) left join t18 on (t17.c2=t18.c1)
left join t19 on (t18.c2=t19.c1) left join t20 on (t19.c2=t20.c1) left join t21 on (t20.c2=t21.c1)
left join t22 on (t21.c2=t22.c1) left join t23 on (t22.c2=t23.c1) left join t24 on (t23.c2=t24.c1)
left join t25 on (t24.c2=t25.c1) left join t26 on (t25.c2=t26.c1) left join t27 on (t26.c2=t27.c1)
left join t28 on (t27.c2=t28.c1) left join t29 on (t28.c2=t29.c1) left join t30 on (t29.c2=t30.c1)
left join t31 on (t30.c2=t31.c1) left join t32 on (t31.c2=t32.c1) left join t33 on (t32.c2=t33.c1)
left join t34 on (t33.c2=t34.c1) left join t35 on (t34.c2=t35.c1) left join t36 on (t35.c2=t36.c1)
left join t37 on (t36.c2=t37.c1) left join t38 on (t37.c2=t38.c1) left join t39 on (t38.c2=t39.c1)
left join t3 ttttt3 on (ttttt3.c6=t33.c5) -- 加一行 left join t40 on (t39.c2=t40.c1) left join t41 on (t40.c2=t41.c1) left join t42 on (t41.c2=t42.c1)
left join t43 on (t42.c2=t43.c1) left join t44 on (t43.c2=t44.c1) left join t45 on (t44.c2=t45.c1)
left join t46 on (t45.c2=t46.c1) left join t47 on (t46.c2=t47.c1) left join t48 on (t47.c2=t48.c1)
left join t5 ttt5 on (t42.c1=ttt5.c3) left join t5 ttt6 on (ttt5.c4=ttt6.c5) left join t5 ttt7 on (ttt6.c6=ttt7.c7) inner join t3 on (t2.c2=t3.c1) left join t4 tt4 on (t47.c1=tt4.c3) left join t4 tt5 on (tt4.c4=tt5.c5) ;

-- 约12秒

-- 调整t3顺序,提前,性能飙升。 -- 约 20毫秒

explain (analyze) select * from t1 inner join t2 on (t1.c2=t2.c1) inner join t3 on (t2.c2=t3.c1) -- 把t3 提前,过滤性提高,性能提升。 left join t4 on (t1.c3=t4.c1) left join t5 on (t4.c2=t5.c1) left join t6 on (t5.c2=t6.c1)
left join t7 on (t6.c2=t7.c1) left join t8 on (t7.c2=t8.c1) left join t9 on (t8.c2=t9.c1)
left join t10 on (t9.c2=t10.c1) left join t11 on (t10.c2=t11.c1) left join t12 on (t11.c2=t12.c1)
left join t13 on (t12.c2=t13.c1) left join t14 on (t13.c2=t14.c1) left join t15 on (t14.c2=t15.c1)
left join t16 on (t15.c2=t16.c1) left join t17 on (t16.c2=t17.c1) left join t18 on (t17.c2=t18.c1)
left join t19 on (t18.c2=t19.c1) left join t20 on (t19.c2=t20.c1) left join t21 on (t20.c2=t21.c1)
left join t22 on (t21.c2=t22.c1) left join t23 on (t22.c2=t23.c1) left join t24 on (t23.c2=t24.c1)
left join t25 on (t24.c2=t25.c1) left join t26 on (t25.c2=t26.c1) left join t27 on (t26.c2=t27.c1)
left join t28 on (t27.c2=t28.c1) left join t29 on (t28.c2=t29.c1) left join t30 on (t29.c2=t30.c1)
left join t31 on (t30.c2=t31.c1) left join t32 on (t31.c2=t32.c1) left join t33 on (t32.c2=t33.c1)
left join t34 on (t33.c2=t34.c1) left join t35 on (t34.c2=t35.c1) left join t36 on (t35.c2=t36.c1)
left join t37 on (t36.c2=t37.c1) left join t38 on (t37.c2=t38.c1) left join t39 on (t38.c2=t39.c1)
left join t3 ttttt3 on (ttttt3.c6=t33.c5) -- 加一行 left join t40 on (t39.c2=t40.c1) left join t41 on (t40.c2=t41.c1) left join t42 on (t41.c2=t42.c1)
left join t43 on (t42.c2=t43.c1) left join t44 on (t43.c2=t44.c1) left join t45 on (t44.c2=t45.c1)
left join t46 on (t45.c2=t46.c1) left join t47 on (t46.c2=t47.c1) left join t48 on (t47.c2=t48.c1)
left join t5 ttt5 on (t42.c1=ttt5.c3) left join t5 ttt6 on (ttt5.c4=ttt6.c5) left join t5 ttt7 on (ttt6.c6=ttt7.c7) left join t4 tt4 on (t47.c1=tt4.c3) left join t4 tt5 on (tt4.c4=tt5.c5) ; ```

最后,可以尝试aqo优化器。

https://github.com/postgrespro/aqo

小结

如果t3的值范围在INDEX的前面,使用MERGE JOIN会有很明显的加速效果

insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1,200) t(id);

如果t3的值范围在INDEX的末尾,使用MERGE JOIN时,前面的依旧需要循环多次

原因如下:

《PostgreSQL merge join 扫描方法实例细说,以及SQL写法注意 或 内核优化建议》

insert into t3 select id, id, id, id, id, id, id, id, id, 'test' from generate_series(1999801,2000000) t(id);

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论