PostgreSQL17优化器改进(7)pg_stats添加范围类型的直方图信息列
我们知道PG_STATS视图提供对存储在pg_statistic表里面的单列统计信息的访问。pg_stats也被设计为能以更适合阅读的格式显示底层目录的信息 但代价是只要为pg_statistic定义了新的槽类型,就必须要扩展此视图的模式。其实对于范围类型列的统计信息在很早的版本已经收集并存储在pg_statistic表中,但是这些信息没有显示在pg_stats视图中。 在PostgreSQL17版本的pg_stats添加了范围类型的直方图统计信息列,新增的三个字段为range_length_histogram、range_empty_frac和range_bounds_histogram,下面我们简单的了解一下。
查看pg_stats视图的结构
PostgreSQL16.3视图结构
testdb=# \d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Collation | Nullable | Default
------------------------+----------+-----------+----------+---------
schemaname | name | | |
tablename | name | | |
attname | name | | |
inherited | boolean | | |
null_frac | real | | |
avg_width | integer | | |
n_distinct | real | | |
most_common_vals | anyarray | | |
most_common_freqs | real[] | | |
histogram_bounds | anyarray | | |
correlation | real | | |
most_common_elems | anyarray | | |
most_common_elem_freqs | real[] | | |
elem_count_histogram | real[] | | |
PostgreSQL17.0的视图结构
testdb=# \d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Collation | Nullable | Default
------------------------+----------+-----------+----------+---------
schemaname | name | | |
tablename | name | | |
attname | name | | |
inherited | boolean | | |
null_frac | real | | |
avg_width | integer | | |
n_distinct | real | | |
most_common_vals | anyarray | | |
most_common_freqs | real[] | | |
histogram_bounds | anyarray | | |
correlation | real | | |
most_common_elems | anyarray | | |
most_common_elem_freqs | real[] | | |
elem_count_histogram | real[] | | |
range_length_histogram | anyarray | | |
range_empty_frac | real | | |
range_bounds_histogram | anyarray | | |
从PostgreSQL17.0的视图结构中,我们可以看到新增了3个列range_length_histogram、range_empty_frac、range_bounds_histogram字段,其中range_length_histogram、range_bounds_histogram字段类型是anyarray的数组
新增字段含义
range_length_histogram
范围类型列的非空和非null范围值长度的直方图。(非范围类型为空)
range_empty_frac
值为空范围的列项的比例。(非范围类型为空)
range_bounds_histogram
非空和非null范围值的下界和上界直方图(非范围类型为空)。这两个直方图表示为一个范围数组,其下界表示下界的直方图,上界表示上界的直方图。
操作
创建测试用例表
CREATE TABLE test(r int4range);
INSERT INTO test
SELECT int4range((random()*10)::integer,(10+random()*10)::integer)
FROM generate_series(1,10000);
SET default_statistics_target = 10;
ANALYZE test;
查看范围类型的统计信息
testdb=# SELECT range_length_histogram,range_empty_frac,
testdb-# range_bounds_histogram
testdb-# FROM pg_stats
testdb-# WHERE tablename = 'test' \gx
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
range_length_histogram | {1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,14,14,14,14,14,14,15,15,15,15,15,15,16,16,16,17,17,17,18,18,19,20}
range_empty_frac | 0.0027
range_bounds_histogram | {"[0,10)","[0,10)","[0,10)","[0,10)","[0,10)","[1,10)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[1,11)","[2,12)","[2,12)","[2,12)","[2,12)","[2,12)","[2,12)","[2,12)","[2,12)","[2,12)","[2,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[3,13)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[4,14)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[5,15)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[6,16)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[7,17)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[8,18)","[9,19)","[9,19)","[9,19)","[9,19)","[9,19)","[9,19)","[9,19)","[9,19)","[9,19)","[9,20)","[10,20)","[10,20)","[10,20)","[10,20)","[10,20)"}
总结
本次我们只是简单了解一下pg_stats新增了范围类型的直方图信息列的功能,但是对于这些字段具体怎么用,以及是优化器是如何利用这些统计信息提升查询的性能,还需深入的研究。如果自己想深入研究,可以自行查看下面的文章,也许会有一些启发。
扩展阅读
如果想深入研究PostgreSQL的统计信息及直方图的可以查看内容
https://postgrespro.com/blog/pgsql/5969296
– / END / –
可以通过下面的方式联系我
如果这篇文章为你带来了灵感或启发,就请帮忙点赞、收藏、转发;如果文章中不严谨或者错漏之处,请及时评论指正。非常感谢!




