暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 14 preview - 大表search IN ( consts )

digoal 2021-01-05
399

作者

digoal

日期

2021-05-19

标签

PostgreSQL , MIN_ARRAY_SIZE_FOR_HASHED_SAOP , in , linear search , hash table probe , 优化


背景

select x from tbl where id in (1,2,3,4,5,6,7,8,9);

in里面是一堆常数, 匹配方式为linear search的方式, 如果tbl表很大, 有很多条记录, linear search会成为巨大瓶颈.

所以有这种SQL优化方法:

```
hash表内存需要够大(work_mem)

select x from tbl where id = any (array[1,2,3,4,5,6,7,8,9]);
```

```
select x from tbl where id in (select xxx from t);

TO

select x from tbl where id = any(array(select xxx from t));
```

PG 14为了降低linear search的开销, 当IN里面的常数大于等于MIN_ARRAY_SIZE_FOR_HASHED_SAOP (9)时, 会对IN里面的常数构造一个hash结构, 采用hash匹配, 而不是linear search.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=50e17ad281b8d1c1b410c9833955bc80fbad4078

注意hash table构建也需要时间, 所以如果是check约束 (例如 check col in (1,2,3,4,5,6,7,8,9,0)) 没有必要使用这种方法, 大量记录运算时linear search才会凸显瓶颈.

因此, PG 14 目前 优化器不会 对 CHECK in约束进行 hash 优化.

```
Speedup ScalarArrayOpExpr evaluation
author David Rowley drowley@postgresql.org
Thu, 8 Apr 2021 11:51:22 +0000 (23:51 +1200)
committer David Rowley drowley@postgresql.org
Thu, 8 Apr 2021 11:51:22 +0000 (23:51 +1200)
commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
tree faf07e47e95ceade572aaf2afdca08bc35ed69e7 tree
parent 1d257577e08d3e598011d6850fd1025858de8c8c commit | diff
Speedup ScalarArrayOpExpr evaluation

ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array. When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.

Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.

The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The
executor will only perform the hash table evaluation when the hashfuncid
is set.

This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set. We could maybe do something about that at some later
date. The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once. Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.

In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is
currently set to 9.

Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
```

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论