基于pg_stat_bgwriter的优化

开源喵 2021-07-06

1323

关于pg_stat_bgwriter视图的。根据我的经验，这个视图并不经常被使用，然而，我认为我们应该深刻了解这个视图，通过这个视图，我们可以调优我们的checkpointer和bgwriter.

pg_stat_bgwriter的作用

using pg_stat_bgwriter it is possible to understand whether bgwriter >and checkpointer configured correctly. Here are a few points to consider:
quality of checkpoints
bgwriter settings

使用pg_stat_bgwriter可以帮助我们确定bgwriter和checkpointer的配置是否正确。

checkpoint相关介绍

In case of xlog, postgres collects a certain amount of WAL and forces to do a checkpoint. In case of timeout, postgres doesn’t have a lot of writing activity for triggering xlog checkpoint so it runs checkpoint after timeout. Checkpoints by timeout are preferable to xlog ones. The main reason is, if postgres collected a large amount of WAL and has to start xlog checkpoint, it needs to write a huge amount of dirty data. This means that the system is under stress during checkpoint and user queries may suffer as a result. The situation is becoming more complex when checkpoint is done by xlog and the next checkpoint starts immediately afterwards, thus they are running one after another and postgres writing changes as they come up, since if storage is overutilized it becomes a bigger issue.

checkpointer在在两种情况之一检查点将被触发:

wal日志的容量到超设置max_wal_size
wal日志的checkpoint_timeout到达时间限制。
上述两个触发检查点的条件，checkpoint_timeout是更可取的方式，因为达max_wal_size，不得不触发checkpoint.从而需要写大量的脏数据。这意味着系统在检查点期间处于压力之下，用户查询可能会因此而受到影响。当检查点由xlog完成（超过max_wal_size）后，紧接着下一个检查点立即开始时，情况变得更加复杂，因此它们一个接一个地运行，并写入更改，因为如果存储被过度利用，它就会成为一个更大的问题。

In the case of checkpoint by timeout the amount of changes to write is evenly spread through timeout and storage doesn’t need to endure high amount of data from postgres.

在检查点超时的情况下，要写入的更改量通过超时平均分布，存储不需要忍受来自邮局的大量数据。

To summarize, you should avoid checkpoints by xlog.

总而言之，应该避免xlog的checkpoint(超过max_wal_size)

结合pg_stat_bgwriter优化checkpoint配置

In this case the pg_stat_bgwriter helps to tune checkpoints’ settings in a better way. There are two columns: checkpoints_timed and checkpoints_req that show number of checkpoints occurred since last reset of stats。General rule is very simple – checkpoints_timed value should be much higher than checkpoints_req. It’s desirable when the last one (checkpoints_req) is near zero

pg_stat_bgwriter可以使用列：checkpints_timed、checkpoint_req帮助我们去优化checkpoint，这两个列记录的是上次重置以来的累计值。优化规则很简单：checkpoints_times需要远远大于checkpoint_req.理想的情况下，checkpoint_req接近于0.

结合pg_stat_bgwriter优化Bgwriter settings

pg_stat_bgwriter helps to estimate bgwriter effectiveness. Bgwriter is a background process that satisfies backends when it needs clean buffers in shared buffers area. Since the moment postgres was started, bgwriter has been working continuously and making tiny delays between processing buffers.

pg_stat_bgwriter有助于评估bgwriter的有效性。Bgwriter是一个后台进程，用于满足backends在shared buffers area中需要干净缓冲区。postgres启动以来，bgbriter一直在持续工作，在处理缓冲区之间有微小的延迟。

Delay means that sometimes bgwriter stops and if backends still require clean buffers, it has to clean buffers by itself. In general, it’s an unwanted behaviour since backends normally do bgwriter’s work. It also can be configured through postgresql.conf. Here is how – there are three settings that need to be adjusted:

bgwriter_delay – size of sleep delay when number of processed buffers exceeded.
bgwriter_lru_maxpages – number of processed buffers after bgwriter delays.
bgwriter_lru_multiplier – multiplier used by bgwriter to calculate how many buffers need to be cleaned out in the next round.

延迟意味着有时bgwriter停止，如果backends仍然需要干净的缓冲区，backends则不得不自己清理缓冲区。一般来说，backends做了bgwriter的工作，这是一种不必要的行为。它可以通过postgresql.conf进行配置.有三个设置需要调整：

bgwriter_delay：处理的Buffers超过一定数量休眠的时间
bgwriter_lru_maxpages：bgwriter休眠后，需要处理的page数量
bgwriter_lru_multiplier：用来计算bgwriter下一轮需要清除多少缓冲区的算法因子(后续需要研究)

you should aim to twist maxpages and multiplier to the maximum and reduce delay to the minimum

应该将maxpages和multiplier调整到允许的最大值，并减小delay最小

pg_stat_bgwriter的相关字段说明

pg_stat_bgwriter, using its stats we can understand some moments related to bgwriter.

maxwritten_clean: shows how many times bgwriter stopped because maxpages was exceeded. When you see high values there, you should increase bgwriter_lru_maxpages.
buffers_clean and buffers_backend: show number of buffers cleaned by bgwriter and postgres’ backends respectively – buffers_clean should be greater than buffers_backend. Otherwise, you should increase bgwriter_lru_multiplier and decrease bgwriter_delay. Note, it also may be a sign that you have insufficient shared buffers and hot part of your data don’t fit into shared buffers and forced to travel between RAM and disks.
buffers_backend_fsync: shows if backends are forced to make its own fsync requests to synchronize buffers with storage. Any values above zero point to problems with storage when fsync queue is completely filled. The newer versions of postgres addressed these issues and I haven’t seen non-zero values now for a long time.

对于pg_stat_bgwriter，使用它的统计数据，我们可以理解一些与bgwiter相关的时刻:

maxwritten_clean 显示由于超过bgwriter_lru_maxpages而导致bgwriter停止的次数。当你看到这个值很高时，你应该增加bgwriter_lru_maxpages。
buffers_clean and buffers_backend:buffers_clean和buffers_backend分别显示由bgwriter和backends清理的缓冲区数——buffers_clean应大于buffers_backend。否则，您应增加bgwriter_lru_multiplier并减少bgwriter_delay。注意，这也可能表明shared buffers不足，并且数据的热部分不适合于共享缓冲区，并且被迫在RAM和磁盘之间移动。
buffers_backend_fsync:显示后端是否被迫自己发出fsync请求，以使缓冲区与存储器同步。任何大于零的值都指向fsync队列完全填充时的存储问题。较新版本的帖子解决了这些问题，我已经很久没有看到非零值了。

After changing settings, I recommend resetting pg_stat_bgwriter stats with pg_stat_reset_shared(‘bgwriter’) function and re-check stats the next day.

更改设置后，我建议使用pg_stat_reset_shared(‘bgwriter’)功能重置pg_stat_bgwriter统计数据，并在第二天重新检查统计数据。

参考文献:https://dataegret.com/2017/03/deep-dive-into-postgres-stats-pg_stat_bgwriter/

postgresql postgresql

文章转载自开源喵，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。