类PostgreSQL数据库执行计划解读-part3

原创 ziyoo0830 2025-08-27

189

执行计划解决-part3

在系列之前的文章中，我写了如何解释 explain analyze 输出的单行，它的结构，以及后来描述了所有基本数据获取操作（explain 树中的节点）。

今天，我们将转向更复杂的操作。

`Function Scan` 函数扫描

示例：

$ explain analyze select * from generate_Series(1,10) i;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.012..0.013 rows=10 loops=1)
 Total runtime: 0.034 ms
(2 rows)

通常来说，它非常简单，本不需要特别说明，但由于接下来的示例中会用到它，因此还是简单介绍一下。

函数扫描（Function Scan）是一个非常简单的节点——它用于执行一个返回结果集的函数，也就是说，它不会执行像 lower() 这样返回单个值的函数，而是执行那些至少可能返回多行或多列的函数。当函数返回数据后，这些行会逐级返回给执行计划树中 Function Scan 上层的节点，或者如果 Function Scan 是执行计划的顶层节点，则直接返回给客户端。

它可能包含的唯一额外逻辑是能够对返回的行进行过滤，例如以下情况：

$ explain analyze select * from generate_Series(1,10) i where i < 3;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series i  (cost=0.00..12.50 rows=333 width=4) (actual time=0.012..0.014 rows=2 loops=1)
   Filter: (i < 3)
   Rows Removed by Filter: 8
 Total runtime: 0.030 ms
(4 rows)

`Sort` 排序

这看起来很容易理解——sort得到记录并以某种方式返回排序后的结果。

示例：

$ explain analyze select * from pg_class order by relname;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Sort  (cost=22.88..23.61 rows=292 width=203) (actual time=0.230..0.253 rows=295 loops=1)
   Sort Key: relname
   Sort Method: quicksort  Memory: 103kB
   ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=203) (actual time=0.007..0.048 rows=295 loops=1)
 Total runtime: 0.326 ms
(5 rows)

虽然这个操作看似简单，但其内部却包含了一些巧妙的逻辑。首先，如果排序所需内存超过 work_mem 的限制，它就会自动切换到基于磁盘的排序方式：

$ explain analyze select random() as x from generate_series(1,14000) i order by x;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=62.33..64.83 rows=1000 width=0) (actual time=16.713..18.090 rows=14000 loops=1)
   Sort Key: (random())
   Sort Method: quicksort  Memory: 998kB
   ->  Function Scan on generate_series i  (cost=0.00..12.50 rows=1000 width=0) (actual time=2.036..4.533 rows=14000 loops=1)
 Total runtime: 18.942 ms
(5 rows)
 
$ explain analyze select random() as x from generate_series(1,15000) i order by x;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=62.33..64.83 rows=1000 width=0) (actual time=27.052..28.780 rows=15000 loops=1)
   Sort Key: (random())
   Sort Method: external merge  Disk: 264kB
   ->  Function Scan on generate_series i  (cost=0.00..12.50 rows=1000 width=0) (actual time=2.171..4.894 rows=15000 loops=1)
 Total runtime: 29.767 ms
(5 rows)

请注意Sort Method:

为了处理这类情况，PostgreSQL 会使用存储在 $PGDATA/base/pgsql_tmp/ 目录下的临时文件。这些文件在不再需要时会被自动删除。

另一个特点是，如果 Sort 节点是由 Limit 操作调用的，它可以改变其工作方式，例如以下情况：

$ explain analyze select * from pg_class order by relfilenode limit 5;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15.77..15.78 rows=5 width=203) (actual time=0.119..0.120 rows=5 loops=1)
   ->  Sort  (cost=15.77..16.50 rows=292 width=203) (actual time=0.118..0.118 rows=5 loops=1)
         Sort Key: relfilenode
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=203) (actual time=0.005..0.047 rows=295 loops=1)
 Total runtime: 0.161 ms
(6 rows)

通常情况下，要对一个数据集进行排序，需要将其全部数据处理完毕。但数据库知道，如果你只需要少量的行，那么它就不必对整个数据集进行完整排序，而只需找出最前面的若干个值即可。

用大 O 表示法来说，普通排序的时间复杂度是 O(m * log(m))，而 Top-N 排序的时间复杂度是 O(m * log(n))，其中 m 是表中的行数，n 是要返回的行数。最重要的是，这种排序方式也占用更少的内存（毕竟它不需要构建完整的已排序行数据集，只需要维护少数几行），因此更不容易超出内存限制，从而避免使用缓慢的磁盘临时文件。

`Limit` 限制

我之前多次使用了 LIMIT，因为它非常简单，但现在让我们完整地描述一下它的工作原理。

Limit 操作会运行其子操作，并仅返回该子操作结果中的前 N 行。通常情况下，一旦获取到足够的行数，Limit 就会停止其子操作的进一步执行。但在某些情况下（例如在 pl/pgsql 函数中），子操作可能在返回第一行时就已经完成了全部处理，因此无法提前中止。

简单示例：

$ explain analyze select * from pg_class;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=203) (actual time=0.008..0.047 rows=295 loops=1)
 Total runtime: 0.096 ms
(2 rows)
 
$ explain analyze select * from pg_class limit 2;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.07 rows=2 width=203) (actual time=0.009..0.010 rows=2 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=203) (actual time=0.008..0.009 rows=2 loops=1)
 Total runtime: 0.045 ms
(3 rows)

如你所见，在第二种情况下使用 LIMIT，导致底层的顺序扫描（Seq Scan）在找到两行数据后立即停止了后续工作。

`HashAggregate`

这个操作基本上在你使用 GROUP BY 以及某些聚合函数（如 sum()、avg()、min()、max()等）时都会被用到。

示例：

$ explain analyze select relkind, count(*) from pg_Class group by relkind;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=12.38..12.42 rows=4 width=1) (actual time=0.223..0.224 rows=5 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=1) (actual time=0.008..0.053 rows=295 loops=1)
 Total runtime: 0.273 ms
(3 rows)

HashAggregate 的工作方式大致如下：对于它接收到的每一行数据，首先提取出 GROUP BY 的键（在本例中是 relkind），然后在哈希表（即关联数组或字典）中，根据该键将当前行放入对应的桶（bucket）中。

在所有行都被处理完毕后，它会遍历哈希表，为每个键值返回一行结果，并在需要时进行相应的计算（如求和、最小值、平均值等）。

重要的是要理解，HashAggregate 必须扫描所有行才能返回单行。

现在，如果你理解了这个过程，你应该能意识到一个潜在的问题：如果有数百万行数据怎么办？生成的哈希表可能会太大，无法完全存放在内存中。这时，work_mem 再次发挥作用。如果生成的哈希表超出内存限制，它就会“溢出”到磁盘上（同样是存储在 $PGDATA/base/pgsql_tmp 目录中）。

这意味着，如果一个执行计划中同时包含 HashAggregate 和 Sort 操作，那么它们最多可以各自使用 work_mem 的内存空间，因此总共可能消耗高达 2 倍的 work_mem。而生成这种执行计划的情况其实很常见，例如：

$ explain analyze select relkind, count(*) from pg_Class group by relkind order by relkind;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12.46..12.47 rows=4 width=1) (actual time=0.260..0.261 rows=5 loops=1)
   Sort Key: relkind
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=12.38..12.42 rows=4 width=1) (actual time=0.221..0.222 rows=5 loops=1)
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=1) (actual time=0.006..0.044 rows=295 loops=1)
 Total runtime: 0.312 ms
(6 rows)

实际上，单个查询可能使用远超 work_mem 限制的内存总量，因为 work_mem 是每个操作的内存使用上限。因此，如果你的查询涉及成百上千个HashAggregate、Sort 或其他使用 work_mem 的操作，总的内存消耗可能会变得非常高。

`Hash Join / Hash`

既然我们已经讨论了 HashAggregate，接下来很自然地就要谈到 Hash Join（哈希连接）。

这个操作与我们之前讨论过的所有其他操作不同，它包含两个子操作。其中一个子操作始终是Hash（哈希），另一个则可以是其他类型的节点。

哈希连接（Hash Join）正如其名，用于连接两个记录集。例如，就像这里的：

$ explain analyze select * from pg_class c join pg_namespace n on c.relnamespace = n.oid;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.14..16.07 rows=292 width=316) (actual time=0.036..0.343 rows=295 loops=1)
   Hash Cond: (c.relnamespace = n.oid)
   ->  Seq Scan on pg_class c  (cost=0.00..10.92 rows=292 width=203) (actual time=0.007..0.044 rows=295 loops=1)
   ->  Hash  (cost=1.06..1.06 rows=6 width=117) (actual time=0.012..0.012 rows=6 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on pg_namespace n  (cost=0.00..1.06 rows=6 width=117) (actual time=0.004..0.005 rows=6 loops=1)
 Total runtime: 0.462 ms
(7 rows)

其工作原理如下：首先，Hash Join 调用 Hash 子操作，而 Hash 又会调用其下层的操作（在本例中是对 pg_namespace 表的顺序扫描）。然后，Hash 操作会基于连接所用的键（在本例中是 pg_namespace 表的 OID 列），将源数据中的行构建为一个哈希表（即内存或磁盘上的哈希结构/关联数组/字典），以便后续快速查找。如果数据量较小，哈希表将保存在内存中；如果数据过大，则会写入磁盘。

当然，对于给定的连接键，可能会有多行数据与之对应（虽然在本例中不会，因为我使用的是主键进行连接，主键具有唯一性；但一般情况下，单个哈希键对应多行数据是完全可能的）。

所以，用 Perl 的表示法，Hash 的输出类似于：

{
    '123' => [ { data for row with OID = 123 }, ],
    '256' => [ { data for row with OID = 256 }, ],
    ...
}

然后，Hash Join 执行第二个子操作（在本例中是对 pg_class 表的顺序扫描），并对该操作返回的每一行执行以下步骤：

检查连接键（在我们的例子中是 pg_class.relnamespace）是否在 Hash 操作返回的哈希表中
如果不是——来自子操作的行将被忽略（不会返回）
如果是——Hash Join 从哈希表中获取行，并根据一侧的行和哈希表中的所有行，生成输出行

需要注意的是，两侧的操作都只会执行一次（在本例中，这两个操作都是顺序扫描），但第一个（由 Hash 调用的）必须返回所有行，并将这些行存储在哈希表中；而第二个操作则是逐行处理的，并且如果某行在另一侧的哈希表中不存在，则该行会被跳过。

当然，由于这两个子扫描可以是任意类型的操作，因此它们可以包含过滤条件、索引扫描，或者你能想到的其他任何操作。

关于 Hash Join 和 Hash 操作的最后一点说明：与 Sort 和 HashAggregate 一样，Hash 操作最多可使用 work_mem 大小的内存。当哈希表所需内存超过该限制时，数据库会将部分数据写入磁盘临时文件，以避免内存耗尽。

`Nested Loop` 嵌套循环

既然我们谈论连接，就必须讨论嵌套循环。例如：

$ explain analyze select a.* from pg_class c join pg_attribute a on c.oid = a.attrelid where c.relname in ( 'pg_class', 'pg_namespace' );
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.28..52.32 rows=16 width=203) (actual time=0.057..0.134 rows=46 loops=1)
   ->  Seq Scan on pg_class c  (cost=0.00..11.65 rows=2 width=4) (actual time=0.043..0.080 rows=2 loops=1)
         Filter: (relname = ANY ('{pg_class,pg_namespace}'::name[]))
         Rows Removed by Filter: 291
   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..20.25 rows=8 width=203) (actual time=0.007..0.015 rows=23 loops=2)
         Index Cond: (attrelid = c.oid)
 Total runtime: 0.182 ms

这是一个非常有趣的计划，因为它可以多次运行指定的操作。

与哈希连接（Hash Join）类似，嵌套循环（Nested Loop）也有两个子节点。首先，它执行第一个操作（在我们的示例中是顺序扫描），通常它会先运行第一个子节点；然后，对于该操作返回的每一行（在本例中为 2 行），都会执行一次第二个操作（在本例中是对 pg_attribute 表的索引扫描）。

你可能会注意到，索引扫描的实际执行信息中包含loops=2。这意味着该操作被执行了两次，而其他数值（如行数、时间等）则是所有执行次数的平均值。

让我们查看这个执行计划:

Nested Loop  (cost=0.00..10715.90 rows=26284 width=4449) (actual time=0.054..291.131 rows=26284 loops=1)
  ->  Index Scan using books_index_title on books  (cost=0.00..3306.28 rows=26284 width=3357) (actual time=0.033..50.773 rows=26284 loops=1)
  ->  Index Scan using categories_pkey on categories  (cost=0.00..0.27 rows=1 width=1092) (actual time=0.002..0.003 rows=1 loops=26284)
        Index Cond: (categories.id = books.category_id)
Total runtime: 312.212 ms

注意，分类（categories）索引扫描的单次实际耗时仅为 0.002 到 0.003 毫秒。但该节点的总耗时却达到了 78.852 毫秒，原因是这个索引扫描被执行了超过 26,000 次。

所以，处理过程如下：

嵌套循环对连接的一侧运行一次，我们称之为A。
对于A中的每一行，它会运行第二个操作（我们称之为B）
如果B没有返回任何行——来自A的数据将被忽略
如果B返回了行，对于它返回的每一行，Nested Loop 会基于来自A的当前行和来自B的当前行返回新行

`Merge Join` 合并连接

另一种连接数据的方法称为归并连接（Merge Join）。当参与连接的两个数据集已经按连接键排序，或者可以低成本地进行排序时，就会使用这种方法。

通过使用子查询来强制排序数据后再进行连接：

$ explain analyze select * from
    ( select oid, * from pg_class order by oid) as c
    join
    ( select * from pg_attribute a order by attrelid) as a
    on c.oid = a.attrelid;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=23.16..268.01 rows=2273 width=410) (actual time=0.658..3.779 rows=2274 loops=1)
   Merge Cond: (pg_class.oid = a.attrelid)
   ->  Sort  (cost=22.88..23.61 rows=292 width=207) (actual time=0.624..0.655 rows=293 loops=1)
         Sort Key: pg_class.oid
         Sort Method: quicksort  Memory: 102kB
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=207) (actual time=0.011..0.211 rows=293 loops=1)
   ->  Materialize  (cost=0.28..212.34 rows=2273 width=203) (actual time=0.028..1.264 rows=2274 loops=1)
         ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..183.92 rows=2273 width=203) (actual time=0.015..0.752 rows=2274 loops=1)
 Total runtime: 4.009 ms
(9 rows)

与其它连接方式一样，归并连接（Merge Join）也会执行两个子操作（在本例中是 Sort 和 Materialize）。由于这两个子操作返回的数据都已经排序，且排序顺序与连接所用的键一致，数据库可以同时扫描这两个子操作的结果集，并简单地逐行比对连接键（如 ID）是否匹配，从而高效地完成连接。

流程是这样的：

如果右侧的连接列与左侧的连接列相同：
- 返回基于当前右侧和左侧行的新的连接行
- 从右侧获取下一行（或者，如果没有更多行，则从左侧获取）
- 进入步骤 1
如果右侧的连接列比左侧的连接列更小：
- 从右侧获取下一行（如果没有更多行，则结束处理）
- 进入步骤 1
如果右侧的连接列比左侧的连接列更大：
- 从左侧获取下一行（如果没有更多行，则结束处理）
- 回到步骤 1

`Hash Join / Nested Loop / Merge Join` 修饰符

在上述所有示例中，我都展示了连接操作只有在从连接的两侧都获取行时才会返回行。

但情况并非总是如此。我们可能有 LEFT/RIGHT/FULL 外连接。还有所谓的反连接。

在左/右连接的情况下，操作名称会变为：

Hash Left Join 哈希左连接
Hash Right Join 哈希右连接
Merge Left Join 合并左连接
Merge Right Join 合并右连接
Nested Loop Left Join 嵌套循环左连接

没有 Nested Loop Right Join，因为 Nested Loop 总是以左侧作为循环的基础。因此，使用 RIGHT JOIN 的连接会在内部转换为 LEFT JOIN，以便 Nested Loop 能够工作。

在所有这些情况下，逻辑都很简单——我们有连接的两个表，即左表和右表。当某个表在连接中被提及（例如在 LEFT JOIN 中提到左表），那么即使另一侧没有匹配的行，连接也会返回一行结果（其中不匹配的部分以 NULL 填充）。

这一切都发生在类似这样的查询中：

select * from a left join b on ...

(or right join). （或右连接）

对于 Hash Join/Merge Join 或 Nested Loop 的其他信息都是相同的，只是生成输出行的逻辑略有不同。

还有一种称为 Full Join 的版本，其操作名称为：

Hash Full Join 哈希全连接
Merge Full Join 合并全连接

在这种情况下，无论两侧的数据是否存在（只要一侧有数据），连接操作都会生成新的输出行。这种情况发生在：

select * from a full join b ...

当然所有处理都与之前相同。

还存在所谓的反连接。它们的操作名称看起来像：

Hash Anti Join 哈希反连接
Merge Anti Join 合并反连接
Nested Loop Anti Join 嵌套循环反连接

在这些情况下，连接只有在右侧找不到任何行时才会发出行。这在执行WHERE not exists ()或left join … where right_table.column is null等操作时很有用。

$ explain analyze select * from pg_class c where not exists (select * from pg_attribute a where a.attrelid = c.oid and a.attnum = 10);
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=62.27..78.66 rows=250 width=203) (actual time=0.145..0.448 rows=251 loops=1)
   Hash Cond: (c.oid = a.attrelid)
   ->  Seq Scan on pg_class c  (cost=0.00..10.92 rows=292 width=207) (actual time=0.009..0.195 rows=293 loops=1)
   ->  Hash  (cost=61.75..61.75 rows=42 width=4) (actual time=0.123..0.123 rows=42 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 2kB
         ->  Index Only Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..61.75 rows=42 width=4) (actual time=0.021..0.109 rows=42 loops=1)
               Index Cond: (attnum = 10)
               Heap Fetches: 0
 Total runtime: 0.521 ms
(9 rows)

在这里，数据库执行了右侧（对 pg_attribute 的索引扫描），然后进行哈希处理，接着执行左侧（对 pg_class 的顺序扫描），只返回在哈希表中没有对应 pg_class.oid 项的行。

`Materialize`

psql 有许多内部命令。其中一个是\dTS——它列出了所有系统数据类型。内部\dTS 执行此查询：

SELECT n.nspname as "Schema",
  pg_catalog.format_type(t.oid, NULL) AS "Name",
  pg_catalog.obj_description(t.oid, 'pg_type') as "Description"
FROM pg_catalog.pg_type t
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
WHERE (t.typrelid = 0 OR (SELECT c.relkind = 'c' FROM pg_catalog.pg_class c WHERE c.oid = t.typrelid))
  AND NOT EXISTS(SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem AND el.typarray = t.oid)
  AND pg_catalog.pg_type_is_visible(t.oid)
ORDER BY 1, 2;

它的计划是：

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2783.00..2783.16 rows=65 width=68) (actual time=3.883..3.888 rows=87 loops=1)
   Sort Key: n.nspname, (format_type(t.oid, NULL::integer))
   Sort Method: quicksort  Memory: 39kB
   ->  Nested Loop Left Join  (cost=16.32..2781.04 rows=65 width=68) (actual time=0.601..3.657 rows=87 loops=1)
         Join Filter: (n.oid = t.typnamespace)
         Rows Removed by Join Filter: 435
         ->  Hash Anti Join  (cost=16.32..2757.70 rows=65 width=8) (actual time=0.264..0.981 rows=87 loops=1)
               Hash Cond: ((t.typelem = el.oid) AND (t.oid = el.typarray))
               ->  Seq Scan on pg_type t  (cost=0.00..2740.26 rows=81 width=12) (actual time=0.012..0.662 rows=157 loops=1)
                     Filter: (pg_type_is_visible(oid) AND ((typrelid = 0::oid) OR (SubPlan 1)))
                     Rows Removed by Filter: 185
                     SubPlan 1
                       ->  Index Scan using pg_class_oid_index on pg_class c  (cost=0.15..8.17 rows=1 width=1) (actual time=0.002..0.002 rows=1 loops=98)
                             Index Cond: (oid = t.typrelid)
               ->  Hash  (cost=11.33..11.33 rows=333 width=8) (actual time=0.241..0.241 rows=342 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 14kB
                     ->  Seq Scan on pg_type el  (cost=0.00..11.33 rows=333 width=8) (actual time=0.002..0.130 rows=342 loops=1)
         ->  Materialize  (cost=0.00..1.09 rows=6 width=68) (actual time=0.000..0.001 rows=6 loops=87)
               ->  Seq Scan on pg_namespace n  (cost=0.00..1.06 rows=6 width=68) (actual time=0.002..0.003 rows=6 loops=1)
 Total runtime: 3.959 ms

为了方便查看，我还将这个计划上传到了 [za6 | explain.depesz.com](https://explain.depesz.com/s/za6)

请注意，那里的操作#9 是 Materialize。为什么会这样？

Materialize 被 Nested Loop Left Join 调用——操作#2。我们知道 Nested Loop 会导致给定操作多次执行，在此情况下是87 次。

连接的右侧是 Seq Scan on pg_namespace。因此，理论上数据库应该对 pg_namespace 执行 87 次 Sequential Scan。考虑到该表的每次 Seq Scan 耗时 0.003ms，我们可以预期总耗时约为 0.25ms。

但数据库更聪明。它意识到，一次扫描整个表，并在内存中构建所有行的表示，成本更低。这样，下次查询时，它就不必再扫描表、检查可见性信息、解析数据页了。它只需从内存中获取数据即可。

正因为如此，整个操作的总耗时——包括一次性读取表数据、构建数据的内存表示形式，以及对该内存结构进行 87 次扫描——仅为 0.087 毫秒。

你可能会问：好的，但之前那个归并连接为什么要使用 Materialize 呢——它不也只是进行了一次扫描吗？让我们回顾一下执行计划：

$ explain analyze select * from
    ( select oid, * from pg_class order by oid) as c
    join
    ( select * from pg_attribute a order by attrelid) as a
    on c.oid = a.attrelid;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=23.16..268.01 rows=2273 width=410) (actual time=0.658..3.779 rows=2274 loops=1)
   Merge Cond: (pg_class.oid = a.attrelid)
   ->  Sort  (cost=22.88..23.61 rows=292 width=207) (actual time=0.624..0.655 rows=293 loops=1)
         Sort Key: pg_class.oid
         Sort Method: quicksort  Memory: 102kB
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=207) (actual time=0.011..0.211 rows=293 loops=1)
   ->  Materialize  (cost=0.28..212.34 rows=2273 width=203) (actual time=0.028..1.264 rows=2274 loops=1)
         ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..183.92 rows=2273 width=203) (actual time=0.015..0.752 rows=2274 loops=1)
 Total runtime: 4.009 ms
(9 rows)

是的，它确实只执行了一次。但问题是，归并连接（Merge Join）的数据源必须满足多个条件。其中一些条件比较明显（例如数据必须是已排序的），而另一些则不那么明显，更多是技术层面的要求（例如数据必须支持前后双向滚动访问）。

正因为这些不太明显的技术要求，数据库有时不得不将来自数据源（在本例中是索引扫描）的数据进行物化（Materialize），以便在使用时具备所有必要的特性，例如支持双向扫描和重复访问等。

简单来说——Materialize 会从其底层操作获取数据，并将其存储在内存中（或部分在内存中），以便能够更快地访问，或提供底层操作所不具备的额外功能（例如支持回溯、重复扫描等）。

今天就到这里了。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

执行计划解读（共6篇）

小白入门执行计划解读

类PostgreSQL数据库执行计划解读-part3

执行计划解决-part3

Function Scan 函数扫描

Sort 排序

Limit 限制

HashAggregate

Hash Join / Hash

Nested Loop 嵌套循环

Merge Join 合并连接

Hash Join / Nested Loop / Merge Join 修饰符

Materialize