PostgreSQL支持如下几种连接方式

原创苏东坡 2025-07-03

215

PostgreSQL支持如下几种连接方式：

Nested Loop Join
Hash Join
Merge Join

postgres=#  create table blogtable1(id1 int, id2 int);
CREATE TABLE
postgres=# create table blogtable2(id1 int, id2 int); 
CREATE TABLE
postgres=# insert into blogtable1 values(generate_series(1,10000),3);
INSERT 0 10000
postgres=# insert into blogtable2 values(generate_series(1,1000),3);
INSERT 0 1000
postgres=# analyze;
ANALYZE

Nested Loop Join

这个方式和mysql的SNLJ一样的原理

NLJ (Nested Loop Join)是一种最简单的连接算法，每条外部关系的记录与每条内部关系的记录相匹配。条件为A.ID < B.ID的A与B之间的连接可以表示为

For each tuple r in A
        For each tuple s in B
                If (r.ID < s.ID)
                    Emit output tuple (r,s)

Nested Loop Join(NLJ)是最常用的连接方法，它几乎可以用于任何类型、任何数据集的连接子句。由于该算法需要扫描外表和内表的所有元组，因此Nested Loop Join被认为是开销最大的连接操作。

根据上面的表和数据，下面的查询会使用Nested Loop Join，如下所示：

postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 < bt2.id1;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..150162.50 rows=3333333 width=16)
   Join Filter: (bt1.id1 < bt2.id1)
   ->  Seq Scan on blogtable1 bt1  (cost=0.00..145.00 rows=10000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on blogtable2 bt2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

因为连接子句是“<”，所以这里唯一可能的连接方法即是Nested Loop Join。注意这里出现了一个新类型的节点：Materialize，该节点作为中间结果缓存，用于替代多次获取一个表中的所有元组，Materialize会将第一次获取的结果集存储在内存中，下一次请求获取元组时将直接从内存中获取，而不是再次从表中数据页内获取。

有些查询可能没有连接子句，在这种情况下，连接的唯一选择就是Nested Loop Join(NLJ)。例如，根据之前的数据考虑以下查询：

postgres=#  explain select * from blogtable1, blogtable2;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Nested Loop  (cost=0.00..125162.50 rows=10000000 width=16)
   ->  Seq Scan on blogtable1  (cost=0.00..145.00 rows=10000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on blogtable2  (cost=0.00..15.00 rows=1000 width=8)
(4 rows)

Hash Join

构建阶段：使用内表记录构建一个哈希表，散列键根据连接子句计算得到（1）遍历内表B的每一条元组r

（2）将r.id插入到hashtab中

探测阶段：外表记录将基于连接子句键进行散列，以在散列中找到匹配的条目

（1）遍历外表A的每一条元组s，进行散列

（2）如果s.id=r.id,输出元组（r,s）

hash表取决于work_mem;

postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 = bt2.id1;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Hash Join  (cost=27.50..220.00 rows=1000 width=16)
   Hash Cond: (bt1.id1 = bt2.id1)
   ->  Seq Scan on blogtable1 bt1  (cost=0.00..145.00 rows=10000 width=8)
   ->  Hash  (cost=15.00..15.00 rows=1000 width=8)
         ->  Seq Scan on blogtable2 bt2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

这里，基于表blogtable2创建了哈希表，因为它是一个较小的表，所以需要的内存也较少，这样整个哈希表页可以完全放入内存中。

Merge join

如果两个表上都创建了索引，那么走hash join 的示例查询可能会使用merge join这是因为由于索引，表中的数据可以按顺序检索。

Merje join是一种，每条外表的记录与每条内表的记录相匹配，直到有可能匹配到连接子句为止的算法。此连接算法仅在两个表都已排序并且连接子句操作符为“=”时才使用。连接条件为A.ID = B.ID，那么表A与B之间的连接可以表示为：

postgres=# 
postgres=# create index idx1 on blogtable1(id1);
CREATE INDEX
postgres=# create index idx2 on blogtable2(id1);
CREATE INDEX
postgres=# explain select * from blogtable1 bt1, blogtable2 bt2 where bt1.id1 = bt2.id1;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Merge Join  (cost=0.56..90.36 rows=1000 width=16)
   Merge Cond: (bt1.id1 = bt2.id1)
   ->  Index Scan using idx1 on blogtable1 bt1  (cost=0.29..318.29 rows=10000 width=8)
   ->  Index Scan using idx2 on blogtable2 bt2  (cost=0.28..43.27 rows=1000 width=8)
(4 rows)

两个表都使用索引扫描而不是顺序扫描，因为这两个表都将返回已排序的记录。

配置join方式

postgres=# select name,setting from pg_settings where name like '%enable%';
              name              | setting 
--------------------------------+---------
 enable_bitmapscan              | on
 enable_gathermerge             | on
 enable_hashagg                 | on
 enable_hashjoin                | on
 enable_indexonlyscan           | on
 enable_indexscan               | on
 enable_material                | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_parallel_append         | on
 enable_parallel_hash           | on
 enable_partition_pruning       | on
 enable_partitionwise_aggregate | off
 enable_partitionwise_join      | off
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
(17 rows)

PostgreSQL支持多种与规划器相关的配置，这些配置可用于提示查询优化器不要选择某些特定类型的连接方法。如果优化器选择的连接方法不是最优的，那么可以关闭这些配置参数，以强制查询优化器选择不同类型的连接方法。默认情况下，所有这些配置参数都是开启的。下面是对于特定连接方法的规划器配置参数：

enable nestloop：对应Nested Loop Join
enable hashjoin：对应Hash Join
enable mergejoin：对应Merge Join

set enable_seqscan to off;
alter user postgres set enable_seqscan to off;

pg 墨力计划

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

PostgreSQL支持如下几种连接方式

评论