BTREE VS BRIN：PostgreSQL数据仓库

PostgreSQL考试认证中心 2023-05-23

478

索引是良好性能的关键。然而，人们经常问：有没有替代 btree 索引的替代方案？我们可以在PostgreSQL中做更小的索引吗?我们可以更快地在PostgreSQL中创建索引吗？我们如何在数据仓库中编制索引？

本博客将回答所有这些问题，并显示哪些选项可用于为数据仓库中的表编制索引。我们将专注于btree（又名B树）索引和BRIN索引之间的史诗般的战斗。

PostgreSQL 中的索引类型

与许多其他关系数据库不同，PostgreSQL支持不止一种索引类型。原因是并非所有索引都适合相同类型的操作。适用于索引名称的索引不一定适用于 GIS 数据，反之亦然。这就是为什么人们可以选择在PostgreSQL中使用哪个索引。

我们如何确定PostgreSQL中实际存在哪些类型的索引？以下是它的工作原理：

test=# SELECT * FROM pg_am;
 oid  | amname |      amhandler       | amtype
------+--------+----------------------+--------
    2 | heap   | heap_tableam_handler | t
  403 | btree  | bthandler            | i
  405 | hash   | hashhandler          | i
  783 | gist   | gisthandler          | i
 2742 | gin    | ginhandler           | i
 4000 | spgist | spghandler           | i
 3580 | brin   | brinhandler          | i
(7 rows)

pg_am系统表显示了您的系统支持哪些类型的索引。请注意，索引实际上可以作为扩展加载，这意味着您看到的列表不一定是常量，您可能会看到更多的条目。

加载示例数据

在我们深入研究btree和BRIN索引的细节之前，加载一些示例数据是有意义的。要做到这一点，最简单的方法是使用generate_series。这是瑞士军刀类型的函数，大多数PostgreSQL顾问基本上都使用它来生成用于测试的巨大表：

test=# CREATE TABLE t_demo (id_sorted int8, id_random int8);
CREATE TABLE
test=# INSERT INTO t_demo
       SELECT id, hashtext(id::text)
       FROM generate_series(1, 5000000000) AS id;
INSERT 0 5000000000

在本例中，我们创建了 5 亿行，生成了一个 206 GB 的表。这里我们有两列：第一列是升序。它包含的数字从1亿到5亿不等.

test=# \d+
List of relations
 Schema | Name   | Type  | Owner | Persistence | Access method | Size   | Description
--------+--------+-------+-------+-------------+---------------+--------+-------------
 public | t_demo | table | hs    | permanent   | heap          | 206 GB |
(1 row)

第二列包含一个哈希值。看一看：

test=# SELECT * FROM t_demo LIMIT 10 OFFSET 50;
 id_sorted | id_random
-----------+-------------
        51 |   342243648
        52 | -1711900017
        53 |   213436769
        54 |  2112769913
        55 | -1318351987
        56 |   -19937162
        57 |  -299365904
        58 | -1228416573
        59 |    93548776
        60 |  1491665190      
(10 rows)

这里的重点是哈希值几乎是随机的，应该均匀分布。它让我们有机会看到各种索引类型（如 btree 和 BRIN）在这两个非常常见的用例中的行为。

创建索引：btree 的性能问题

让我们创建一个简单的btree。在第一步中，我们使用PostgreSQL默认配置（我所做的唯一更改是将max_wal_size设置为100 GB——所有其他设置都是默认设置

创建一个标准的btree索引将花费我们35分钟：

test=# CREATE INDEX idx_id ON t_demo (id_sorted);
CREATE INDEX
Time: 2109651,552 ms (35:09,652)

然而，我们可以做得更好。如果我们再次降低索引并将maintenance_work_mem提升到1GB，如果我们增加PostgreSQL允许使用的工作进程数量，我们可以加快速度：

test=# SHOW maintenance_work_mem ;
 maintenance_work_mem
----------------------
 64MB
(1 row)


Time: 0,170 ms
test=# SET maintenance_work_mem TO '1 GB';
SET
Time: 0,181 ms
test=# SET max_parallel_maintenance_workers TO 4;
SET

现在让我们再次创建相同的 btree 索引，看看会发生什么

[hs@hanspc user_test]$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--- ------cpu-----
r b swpd   free   buff cache    si  so bi       bo      in    cs   us sy id wa st
5 0 450048 235408 1800 25689472 0   2  2964     6838    21    41   9  2  88 1  0
5 0 450048 261876 1800 25650596 0   0  1225912  8       19425 8512 39 7  54 0  0
7 0 450048 257548 1800 25660116 0   0  941458   277776  19766 7638 32 7  53 7  0
1 5 450048 232476 1800 25690880 0   0  857176   411110  17791 7603 34 7  50 9  0
5 0 450048 260136 1800 25663332 0   0  756074   485380  16005 7811 28 6  53 13 0
3 4 450048 242340 1800 25679688 0   0  722454   666192  16785 8842 26 10 49

在查看vmstat时，我们可以看到PostgreSQL开始每秒读取数百MB，并开始一次向磁盘溢出数百MB/秒。由于我们无法对记忆中的一切进行排序，这正是我们所期望的。

在第一阶段，我们确切地看到了进度监控基础设施所揭示的情况：

test=# \x
Expanded display is on.
test=# SELECT * FROM pg_stat_progress_create_index;
-[ RECORD 1 ]------+-------------------------------
pid                | 23147
datid              | 24576
datname            | test
relid              | 24583
index_relid        | 0
command            | CREATE INDEX
phase              | building index: scanning table
lockers_total      | 0
lockers_done       | 0
current_locker_pid | 0
blocks_total       | 27027028
blocks_done        | 8577632
tuples_total       | 0
tuples_done        | 0
partitions_total   | 0
partitions_done    | 0

PostgreSQL 将首先扫描表并准备数据进行排序。
在工艺表中，我们可以看到所有所需的内核都在全速工作以实现这一目标：

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
38830 hs 20 0 594336 219780 12152 R 99,0 0,7 0:06.65 postgres
38831 hs 20 0 594336 219748 12120 R 99,0 0,7 0:06.65 postgres
38833 hs 20 0 594336 219920 12288 R 98,7 0,7 0:06.64 postgres
38832 hs 20 0 594336 219916 12284 R 98,3 0,7 0:06.63 postgres
23147 hs 20 0 514964 276408 149056 R 97,7 0,8 106:48.69 postgres

完成此步骤后，PostgreSQL可以开始对这些部分数据集进行排序。请记住，表太大，无法在内存中发生这种情况，因此我们不得不溢出到磁盘：

-[ RECORD 1 ]-------+------------------------------------
pid                 | 23147
datid               | 24576
datname             | test
relid               | 24583
index_relid         | 0
command             | CREATE INDEX
phase               | building index: sorting live tuples
lockers_total       | 0
lockers_done        | 0
current_locker_pid  | 0
blocks_total        | 27027028
blocks_done         | 27027028
tuples_total        | 0
tuples_done         | 0
partitions_total    | 0
partitions_done     | 0

在此阶段之后，PostgreSQL将数据添加到树中并将索引写入磁盘：

-[ RECORD 1 ]------+---------------------------------------
pid                | 23147
datid              | 24576
datname            | test
relid              | 24583
index_relid        | 0
command            | CREATE INDEX
phase              | building index: loading tuples in tree
lockers_total      | 0
lockers_done       | 0
current_locker_pid | 0
blocks_total       | 0
blocks_done        | 0
tuples_total       | 5000000000
tuples_done        | 2005424786
partitions_total   | 0
partitions_done    | 0

我们成功地将索引过程的性能提高了近一倍，这是一项非常伟大的成就：

test=# CREATE INDEX idx_id ON t_demo (id_sorted);
CREATE INDEX
Time: 1212601,300 ms (20:12,601)

如果你想了解更多关于索引的信息，我也可以推荐我的另一篇博客文章，讨论PostgreSQL中更快的索引创建。

这里值得注意的是我们刚刚创建的索引的大小：

test=# \di+
List of relations
 Schema | Name   | Type  | Owner | Table  | Persistence | Access method | Size   | Description
--------+--------+-------+-------+--------+-------------+---------------+--------+-------------
 public | idx_id | index | hs    | t_demo | permanent   | btree         | 105 GB |
(1 row)

105 GB是相当巨大的。但是，它基本上证实了以下经验法则：通常索引每个索引条目需要大约 25 个字节——假设我们没有任何重复条目。

创建第二个索引：

test=# CREATE INDEX idx_random ON t_demo (id_random);
CREATE INDEX
Time: 1731088,437 ms (28:51,088)

在本例中，我们对随机列进行了索引。由于我们的输入流不再排序，因此创建此索引需要更长的时间。

使用树运行查询

运行查询并查看到目前为止 btree 索引做了什么

test=# SELECT count(*)
FROM t_demo
WHERE id_sorted BETWEEN 10000000 AND 20000000;
   count
----------
 10000001
(1 row)


Time: 358,804 ms

我们在这里看到的是惊人的性能。我们设法在短短 10 毫秒内提取了 358 万行。让我们深入了解执行计划，看看会发生什么.

test=# explain (analyze, buffers) SELECT count(*)
       FROM  t_demo
       WHERE id_sorted BETWEEN 10000000 AND 20000000;
                            QUERY PLAN              
---------------------------------------------------------------------
 Finalize Aggregate (cost=296509.46..296509.47 rows=1 width=8)
                   (actual time=496.450..500.609 rows=1 loops=1)
                   Buffers: shared hit=11 read=27325
    ->  Gather   (cost=296509.24..296509.45 rows=2 width=8)
                (actual time=496.379..500.603 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=11 read=27325
         ->  Partial Aggregate  (cost=295509.24..295509.25 rows=1 width=8)
                                (actual time=492.621..492.622 rows=1 loops=3)
               Buffers: shared hit=11 read=27325
               ->  Parallel Index Only Scan using idx_id on t_demo 
                        (cost=0.58..284966.32 rows=4217169 width=0)
                        (actual time=0.074..340.666 rows=3333334 loops=3)
                     Index Cond: ((id_sorted >= 10000000)
                            AND (id_sorted <= 20000000))
                     Heap Fetches: 0
                     Buffers: shared hit=11 read=27325
 Planning:
   Buffers: shared hit=7 read=3
 Planning Time: 0.233 ms
 Execution Time: 500.662 ms
(16 rows)

以上是仅并行索引扫描。实际上这是个好消息，因为这意味着扫描索引就足够了。无需从基础表中提取数据。我们只需要确保行的存在（在这种情况下，这是通过PostgreSQL提示位完成的）。

现在我们将运行相同的查询，但这次我们将使用随机列。
请注意，我稍微缩小了范围，以确保我们也获得 10 万行。因此，结果集的大小大致相同：

test=# explain SELECT count(*)
       FROM  t_demo
       WHERE id_random BETWEEN 10000000 AND 19000000;
                            QUERY PLAN
-------------------------------------------------------------------
 Aggregate  (cost=22871994.27..22871994.28 rows=1 width=8)
   ->  Index Only Scan using idx_random on t_demo 
            (cost=0.58..22846435.29 rows=10223592 width=0)
         Index Cond: ((id_random >= 10000000)
            AND (id_random <= 19000000))
(3 rows)

使用多个内核的并行查询已消失。但是，情况更糟...查看查询期间发生的情况：

[hs@hanspc user_test]$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1 1377280 228168   1032 27286216    0    0 120584   314 17933 30276  2  5 86  7  0
 1  0 1377280 231608   1032 27286224    0    0 121102  1069 17839 30854  2  3 88  7  0
 1  1 1377280 251400   1032 27270360    0    0 121168   869 17669 30177  1  4 88  7  0
 0  1 1377280 263096   1032 27254396    0    0 121674   514 17780 30695  2  2 88  7  0
 0  1 1377280 236788   1032 27275712    0    0 121630   260 18403 31970  2  2 88  7  0

哇，我们看到 120 MB/秒的持续 I/O，这意味着可怕的运行时。
请注意，数据量是相同的 - 所有有变化的是磁盘上数据的分布：

test=# SELECT count(*)
       FROM t_demo
       WHERE id_random BETWEEN 10000000 AND 19000000;
  count
----------
 10481155
(1 row)
Time: 322747,613 ms (05:22,748)

5分钟。正如我们在这里看到的，我们的运行时已经爆炸了。我们可以找出原因：

test=# explain (analyze, buffers) SELECT count(*)
       FROM  t_demo
       WHERE id_random BETWEEN 10000000 AND 19000000;
                           QUERY PLAN                                                  
-------------------------------------------------------------------
 Aggregate  (cost=22871994.27..22871994.28 rows=1 width=8)
            (actual time=292381.841..292381.843 rows=1 loops=1)
            Buffers: shared hit=6611938 read=3884860
   ->  Index Only Scan using idx_random on t_demo
        (cost=0.58..22846435.29 rows=10223592 width=0)
        (actual time=0.515..291336.908 rows=10481155 loops=1)
         Index Cond: ((id_random >= 10000000)
                  AND (id_random <= 19000000))
         Heap Fetches: 3866504
         Buffers: shared hit=6611938 read=3884860
 Planning Time: 1.080 ms
 Execution Time: 292381.934 ms
(8 rows)


Time: 292384,903 ms (04:52,385)

数据遍布各地。因此，我们需要数百万个 8k 页面来处理查询。在这些块中，我们只需要一个数据子集。这会导致大量的运行时间。

BRIN指数来救援？

BRIN指数通常被视为仓储应用的救星。但是，您不应该对此非常确定。

首先，我们将创建一个索引：

test=# CREATE INDEX idx_brin ON t_demo USING brin (id_sorted);
CREATE INDEX
Time: 710549,637 ms (11:50,550)

索引制作得非常快。同样非常有说服力的是BRIN指数的规模：

test=# SELECT pg_size_pretty(pg_relation_size('idx_brin'));
 pg_size_pretty
----------------
 7064 kB
(1 row)

它只有 7 MB，为我们节省了惊人的 10 GB 磁盘。在索引创建过程中，我们还可以看到该过程看起来略有不同：

test=# SELECT * FROM pg_stat_progress_create_index;
-[ RECORD 1 ]------+---------------
pid                | 23147
datid              | 24576
datname            | test
relid              | 24583
index_relid        | 0
command            | CREATE INDEX
phase              | building index
lockers_total      | 0
lockers_done       | 0
current_locker_pid | 0
blocks_total       | 27027028
blocks_done        | 16898015
tuples_total       | 0
tuples_done        | 0
partitions_total   | 0
partitions_done    | 0

BRIN 不像正常的 btree 索引创建那样广泛。这很重要，因为它可以减少临时 I/O 并加快索引创建速度。

比较查询：

test=# explain (analyze, buffers) SELECT count(*)
       FROM  t_demo
       WHERE id_sorted BETWEEN 10000000 AND 20000000;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Finalize Aggregate (cost=49930967.53..49930967.54 rows=1 width=8)
                    (actual time=664.605..671.519 rows=1 loops=1)
                    Buffers: shared hit=311 read=55025
    ->  Gather  (cost=49930967.32..49930967.53 rows=2 width=8)
                (actual time=664.387..671.502 rows=3 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          Buffers: shared hit=311 read=55025
          ->  Partial Aggregate (cost=49929967.32..49929967.33 rows=1 width=8)
                                (actual time=653.953..653.955 rows=1 loops=3)
                Buffers: shared hit=311 read=55025
                ->  Parallel Bitmap Heap Scan on t_demo 
                        (cost=5390.21..49920339.39 rows=3851169 width=0)
                        (actual time=56.944..489.563 rows=3333334 loops=3)
                      Recheck Cond: ((id_sorted >= 10000000)
                                 AND (id_sorted <= 20000000))
                      Rows Removed by Index Recheck: 5546
                      Heap Blocks: lossy=19046
                      Buffers: shared hit=311 read=55025
                      ->  Bitmap Index Scan on idx_brin 
                                (cost=0.00..3079.51 rows=9258865 width=0)
                                (actual time=65.293..65.294 rows=541440 loops=1)
                            Index Cond: ((id_sorted >= 10000000)
                                     AND (id_sorted <= 20000000))
                            Buffers: shared hit=311 read=881
  Planning:
    Buffers: shared hit=1
  Planning Time: 0.210 ms
  Execution Time: 671.623 ms
 (20 rows

btree只使用了27000个块就成功地运行了查询——现在我们需要大约55000个块——这意味着更多的运行时间。

虽然BRIN索引通常是此类用例的一个好主意，但它并不总是灵丹妙药。它确实需要更少的空间，而且可以更快地建造。但is并不是在任何情况下都能神奇地解决所有案件。

请记住，我并不是说BRIN不好——与btree指数相比，这通常是一个加速。我想说的是，你应该比较各种设置，并决定什么是最好的。
此外：请记住，相关性确实很重要。

原文链接

https://www.cybertec-postgresql.com/en/btree-vs-brin-2-options-for-indexing-in-postgresql-data-warehouses/

PCP 认证专家报名

文章转载自PostgreSQL考试认证中心，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。