[译] PostgreSQL 中的分区表统计信息

原创 tinge 2025-09-04

我最近帮助一位客户解决了一个慢查询问题。最终，ANALYZE在分区表上执行了一次查询就解决了问题。这让客户很惊讶，因为自动清理功能已经启用。所以我决定写一篇关于 PostgreSQL 如何收集分区表统计信息以及它们如何影响 PostgreSQL 估算的文章。

我所说的“分区表统计信息”是什么意思？

我所说的优化器统计信息，正如我在关于升级期间保留统计信息的文章中所定义的。这些数据是查询优化器用来估算执行计划成本的数据。您可以在系统视图中查看表列的优化器统计信息pg_stats。

在 PostgreSQL 中，分区表是一种逻辑结构，有点类似于视图。分区表本身不包含任何数据。数据驻留在分区表的各个分区中。向分区表中插入一行实际上是将该行插入到相应的分区中。在 PostgreSQL 中，分区是普通表，只是作为分区表的一个分区承担了“附带任务”。

当我谈论分区表统计信息时，我指的是分区表本身的优化器统计信息，而不是其分区的优化器统计信息。

分区表统计信息示例

为了演示分区表的统计信息，我将使用一个简单的示例：

PgSQL
CREATE TABLE tab (
   pk integer NOT NULL
) PARTITION BY LIST ((pk % 2));
 
CREATE UNLOGGED TABLE tab_even
PARTITION OF tab FOR VALUES IN (0);
 
CREATE UNLOGGED TABLE tab_odd
PARTITION OF tab FOR VALUES IN (1);

为了获得更好的性能，我使用了未记录的表，因为我们不需要崩溃恢复。每个分区将接收我一半的行INSERT到表中：

INSERT INTO tab
SELECT * FROM generate_series(1, 1000000);

分区表统计信息和自动清理

使用默认配置，自动清理启动器将休眠最多一分钟，然后再激活。然后，它将处理我们的新表并运行VACUUM和ANALYZE。因此，让我们等待两分钟，然后再查看统计信息：

SELECT *
FROM pg_stats
WHERE tablename = 'tab_even'
  AND attname = 'pk';
 
-[ RECORD 1 ]----------+----------------------
schemaname             | laurenz
tablename              | tab_even
attname                | pk
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {82,10896,21078,...}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |
 
SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';
 
(0 rows)

PostgreSQL 有分区的统计信息，但没有分区表的统计信息！我们可以在文档中找到解释：

分区表不直接存储元组，因此不会被自动清理 (autovacuum) 处理。（自动清理会像处理其他表一样处理表分区。）遗憾的是，这意味着自动清理无法ANALYZE在分区表上运行，这可能会导致引用分区表统计信息的查询计划不理想。您可以通过ANALYZE在分区表首次填充数据时手动运行自动清理，以及在其分区中的数据分布发生显著变化时再次运行自动清理来解决此问题。

分区表统计信息的重要性

由此可见，分区表不会自动获取统计信息。问题在于这些统计信息有多重要。毕竟，分区表不包含任何数据，而我们拥有分区的统计信息。对于许多 SQL 语句来说，这已经足够了。以下是示例：

PgSQL
EXPLAIN (ANALYZE) SELECT * FROM tab WHERE pk < 1000;
 
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12872.14 rows=913 width=4) (actual time=0.558..27.241 rows=999.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Parallel Append  (cost=0.00..11780.84 rows=380 width=4) (actual time=6.016..20.488 rows=333.00 loops=3)
         Buffers: shared hit=4426
         ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5889.47 rows=249 width=4) (actual time=3.035..9.128 rows=166.33 loops=3)
               Filter: (pk < 1000)
               Rows Removed by Filter: 166500
               Buffers: shared hit=2213
         ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5889.47 rows=288 width=4) (actual time=6.741..16.962 rows=250.00 loops=2)
               Filter: (pk < 1000)
               Rows Removed by Filter: 249750
               Buffers: shared hit=2213
 Planning Time: 0.199 ms
 Execution Time: 27.359 ms

正如您所见，估计是相当正确的。

但是，PostgreSQL 使用分区表统计信息来估计连接的结果行数。请看以下示例：

PgSQL
EXPLAIN (ANALYZE)
SELECT *
FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2.25..28178.25 rows=500000 width=8) (actual time=0.164..145.455 rows=100.00 loops=1)
   Hash Cond: (tab.pk = gs.id)
   Buffers: shared hit=4426
   ->  Append  (cost=0.00..19426.00 rows=1000000 width=4) (actual time=0.018..84.210 rows=1000000.00 loops=1)
         Buffers: shared hit=4426
         ->  Seq Scan on tab_even tab_1  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.017..21.568 rows=500000.00 loops=1)
               Buffers: shared hit=2213
         ->  Seq Scan on tab_odd tab_2  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.019..16.421 rows=500000.00 loops=1)
               Buffers: shared hit=2213
   ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.119..0.120 rows=100.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.058..0.071 rows=100.00 loops=1)
 Planning:
   Buffers: shared hit=73
 Planning Time: 0.743 ms
 Execution Time: 145.540 ms

所有行数估计都是正确的——除了哈希连接的估计：它偏离了5000倍！让我们ANALYZE看看分区表是否能改善估计：

ANALYZE tab;
 
EXPLAIN (ANALYZE)
SELECT *
FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1002.25..14966.87 rows=100 width=8) (actual time=0.382..57.992 rows=100.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Hash Join  (cost=2.25..13956.87 rows=42 width=8) (actual time=16.681..52.161 rows=33.33 loops=3)
         Hash Cond: (tab.pk = gs.id)
         Buffers: shared hit=4426
         ->  Parallel Append  (cost=0.00..12391.69 rows=416668 width=4) (actual time=0.008..29.572 rows=333333.33 loops=3)
               Buffers: shared hit=4426
               ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.004..6.317 rows=166666.67 loops=3)
                     Buffers: shared hit=2213
               ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.007..10.296 rows=250000.00 loops=2)
                     Buffers: shared hit=2213
         ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.058..0.058 rows=100.00 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.027..0.036 rows=100.00 loops=3)
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.237 ms
 Execution Time: 58.041 ms

成功了！现在我们还可以在以下位置看到数据pg_stats：

PgSQL
SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';
 
-[ RECORD 1 ]----------+----------------------
schemaname             | laurenz
tablename              | tab
attname                | pk
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {29,10972,20818,...}
correlation            | 0.5027541
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |

如何收集分区表统计信息？

显然，我们需要收集分区表的统计信息——至少针对某些查询。另一方面，自动清理功能无法为我们完成这项工作。因此，我们必须ANALYZE为分区表安排明确的运行计划。我们可以使用操作系统调度程序（例如cron），PostgreSQL 扩展（例如pg_timetable），或者您构建到应用程序中的工具。

分区表应该多久检查一次ANALYZE？这取决于你的用例，但我认为大多数情况下，每天一次或每周一次就足够了。毕竟，你通常会对大型表进行分区，这些数据不太可能在一天或一周内发生很大变化。

为什么分区表被排除在自动清理之外？

答案可能是还没有人去实现这一点。

对于普通表，PostgreSQL 使用监控统计信息来pg_stat_all_tables.n_mod_since_analyze确定表是否需要自动分析。由于分区表本身不包含任何数据，因此此数字始终为零。有两种方法可以对分区表进行自动分析：

n_mod_since_analyze每当更新分区时，更新分区表的计数
当自动清理确定是否对分区表进行清理时，从所有分区中ANALYZE添加n_mod_since_analyze

第一种方法会加重所有数据修改语句的负担，所以我认为第二种方法更有前景。或许会有一些读者受到启发，去改进 PostgreSQL！

结论

我们已经看到，autovacuum 不会收集分区表的统计信息。ANALYZE如果我们想要获得良好的执行计划，就必须使用显式命令来收集它们。

我非常感谢 Andrey Lepikhov 对这些统计数据真正重要的指导！

原文地址：
原文作者：

postgresql 分区表

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者