PostgreSQL 13 新增 hash 内存参数, 提高hash操作性能 - hash_mem_multiplier - work_mem的倍数 - 支持超越内存时采用disk 存储hash表, 支持更多场景使用hashagg, TPCH性能大幅提升

digoal 2020-08-03

930

作者

digoal

日期

2020-08-03

背景

work_mem参数用于hash聚合、join, 排序, 分组聚合等.

但是实际上hash操作相比排序, 分组聚合等更耗费内存, 而sql通常属于分析类sql, 调用相比于排序更低频.

为了低频的sql去设置非常大的work_mem, 可能会导致oom, 因为排序, 分组聚合的sql较多, 导致了oom.

PG 13允许单独设置hash操作的mem, 通过hash_mem_multiplier参数来设置, 意思是work_mem的倍数, 调大hash_mem_multiplier可以提高hash操作性能.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097

```
Add hash_mem_multiplier GUC.
author Peter Geoghegan pg@bowt.ie
Thu, 30 Jul 2020 05:14:58 +0800 (14:14 -0700)
committer Peter Geoghegan pg@bowt.ie
Thu, 30 Jul 2020 05:14:58 +0800 (14:14 -0700)
commit d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097
tree 8d0d2cdb7d18504b50a49433f9181130f74186c4 tree | snapshot
parent 6023b7ea717ca04cf1bd53709d9c862db07eaefb commit | diff
Add hash_mem_multiplier GUC.

Add a GUC that acts as a multiplier on work_mem. It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.

The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit. It is intended to enable admin
tuning of the executor's memory usage. Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).

The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value. This means that hash-based nodes continue to apply
work_mem in the traditional way by default.

hash_mem_multiplier is generally useful. However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0). While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited. Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).

hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem. This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit. Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).

The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation. That can be taken care of by a later commit.

Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
```

另外PG 13也支持了disk based hash.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1f39bce0

```
Disk-based Hash Aggregation.
author Jeff Davis jdavis@postgresql.org
Thu, 19 Mar 2020 06:42:02 +0800 (15:42 -0700)
committer Jeff Davis jdavis@postgresql.org
Thu, 19 Mar 2020 06:42:02 +0800 (15:42 -0700)
commit 1f39bce021540fde00990af55b4432c55ef4b3c7
tree c2403fb61234d93408b23350a82ad429b3625af3 tree | snapshot
parent e00912e11a9ec2d29274ed8a6465e81385906dc2 commit | diff
Disk-based Hash Aggregation.

While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".

In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.

The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).

Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
```

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL 13 新增 hash 内存参数, 提高hash操作性能 - hash_mem_multiplier - work_mem的倍数 - 支持超越内存时采用disk 存储hash表, 支持更多场景使用hashagg, TPCH性能大幅提升

作者

日期

标签

背景

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论