PG数据库中的HINT

原创唐祖亮 2020-05-07

5115

在日常中会遇到SQL使用的执行计划并不是自己想所要的执行计划这种情况，因此我们会使用HINT来改变以及固定SQL的执行计划，而在PGSQL中就有有以下三种常用的提示
1、扫描方式
2、表连接方式
3、表连接顺序
下面我们就来大致了解下这些提示的用法

1、创建测试表并插入数据

create table t1 (id int, t int, name varchar(255));
create table t2 (id int , salary int);
create table t3 (id int , age int);

insert into t1 values (1,200,‘jack’);
insert into t1 values (2,300,‘tom’);
insert into t1 values (3,400,‘john’);
insert into t2 values (1,40000);
insert into t2 values (2,38000);
insert into t2 values (3,18000);
insert into t3 values (3,38);
insert into t3 values (2,55);
insert into t3 values (1,12);

create index idx_t1_id on t1(id);
create index idx_t2_id on t2(id);
create index idx_t3_id on t3(id);

2、改变扫描方式的HINT

改变扫描方式的HINT有以下几种：

下面对进行测试，先查看原始的执行计划如下：

explain (analyze,buffers) select * from t1 where id=2;

testdb=> explain (analyze,buffers) select * from t1 where id=2;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..1.04 rows=1 width=524) (actual time=0.006..0.007 rows=1 loops=1)
   Filter: (id = 2)
   Rows Removed by Filter: 2
   Buffers: shared hit=1
 Planning Time: 0.053 ms
 Execution Time: 0.016 ms
(6 rows)

从上面的执行计划中可以看到是对t1表的全表扫描，接下来加入使用索引扫描的HINT让其使用索引扫描

explain (analyze,buffers) select /*+ indexscan(t1) */ * from t1 where id=2;

testdb=> explain (analyze,buffers) select /*+ indexscan(t1) */ * from t1 where id=2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1_id on t1  (cost=0.13..8.15 rows=1 width=524) (actual time=0.077..0.079 rows=1 loops=1)
   Index Cond: (id = 2)
   Buffers: shared hit=1 read=1
 Planning Time: 0.214 ms
 Execution Time: 0.096 ms
(5 rows)

在t1表的t列创建一个索引，测试HINT是否可以指定索引

testdb=> create index idx_t1_t on t1(t);
CREATE INDEX

testdb=> explain (analyze,buffers) select /*+ indexscan(t1 idx_t1_t) */ * from t1 where id=2 and t=400;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1_t on t1  (cost=0.13..8.15 rows=1 width=524) (actual time=0.074..0.075 rows=0 loops=1)
   Index Cond: (t = 400)
   Filter: (id = 2)
   Rows Removed by Filter: 1
   Buffers: shared hit=1 read=1
 Planning Time: 0.156 ms
 Execution Time: 0.087 ms
(7 rows)

从上面的执行计划可以看到确实是使用了HINT中所指定的索引，
说明HINT是可以指定索引的。

3、改变连接方式的HINT

改变连接方式的HINT有以下几种：

用t1、t2两张表做内连接查原始执行计划

testdb=> explain analyze select * from t1,t2 where t1.id=t2.id and t1.id>1;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.07..13.00 rows=1 width=532) (actual time=0.020..0.022 rows=2 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on t1  (cost=0.00..11.75 rows=47 width=524) (actual time=0.007..0.008 rows=2 loops=1)
         Filter: (id > 1)
         Rows Removed by Filter: 1
   ->  Hash  (cost=1.03..1.03 rows=3 width=8) (actual time=0.005..0.006 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=1)
 Planning Time: 0.154 ms
 Execution Time: 0.040 ms
(10 rows)

从执行计划可以看到两张表使用的是HASH连接，
现在使用HINT改变连接方式分别为MEREG和NESTLOOP。

testdb=> explain analyze select /*+ mergejoin(t1 t2) */* from t1,t2 where t1.id=t2.id and t1.id>1;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=14.11..14.37 rows=1 width=532) (actual time=0.023..0.025 rows=2 loops=1)
   Merge Cond: (t1.id = t2.id)
   ->  Sort  (cost=13.06..13.17 rows=47 width=524) (actual time=0.014..0.014 rows=2 loops=1)
         Sort Key: t1.id
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t1  (cost=0.00..11.75 rows=47 width=524) (actual time=0.009..0.010 rows=2 loops=1)
               Filter: (id > 1)
               Rows Removed by Filter: 1
   ->  Sort  (cost=1.05..1.06 rows=3 width=8) (actual time=0.005..0.006 rows=3 loops=1)
         Sort Key: t2.id
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=1)
 Planning Time: 0.212 ms
 Execution Time: 0.047 ms
(14 rows)


testdb=> explain analyze select /*+ nestloop(t2 t1) */* from t1,t2 where t1.id=t2.id and t1.id>1;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..14.90 rows=1 width=532) (actual time=0.017..0.021 rows=2 loops=1)
   Join Filter: (t1.id = t2.id)
   Rows Removed by Join Filter: 4
   ->  Seq Scan on t1  (cost=0.00..11.75 rows=47 width=524) (actual time=0.009..0.010 rows=2 loops=1)
         Filter: (id > 1)
         Rows Removed by Filter: 1
   ->  Materialize  (cost=0.00..1.04 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=2)
         ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=1)
 Planning Time: 0.319 ms
 Execution Time: 0.041 ms
(10 rows)

可以看到成功的对连接方式进行了更改，
在使用NESTLOOP连接的提示时把t2表写在前面并没有对执行计划中的表连接顺序产生影响。

4、改变表连接顺序的HINT

用t1、t2、t3三张表做内连接查原始执行计划

testdb=> explain analyze select * from t1,t2,t3 where t1.id=t2.id and t2.id= t3.id;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.20..13.75 rows=1 width=540) (actual time=0.036..0.044 rows=3 loops=1)
   ->  Hash Join  (cost=1.07..13.02 rows=3 width=532) (actual time=0.024..0.027 rows=3 loops=1)
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on t1  (cost=0.00..11.40 rows=140 width=524) (actual time=0.005..0.005 rows=3 loops=1)
         ->  Hash  (cost=1.03..1.03 rows=3 width=8) (actual time=0.006..0.007 rows=3 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.003..0.003 rows=3 loops=1)
   ->  Index Scan using idx_t3_id on t3  (cost=0.13..0.23 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=3)
         Index Cond: (id = t1.id)
 Planning Time: 0.417 ms
 Execution Time: 0.070 ms
(11 rows)

从上面的执行计划中可以看到，执行顺序是先对t1表全表扫描然后生成哈希表，生成的哈希表和t2表的全表扫描结果进行哈希连接，最后和t3表使用索引过滤后的条件进行嵌套循环连接。现在使用HINT让其改变连接顺序

testdb=> explain analyze select /*+ leading(t2 t3 ) */* from t1,t2,t3 where t1.id=t2.id and t2.id= t3.id;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000001.07..10000000020.19 rows=1 width=540) (actual time=0.026..0.033 rows=3 loops=1)
   Join Filter: (t2.id = t1.id)
   Rows Removed by Join Filter: 6
   ->  Hash Join  (cost=1.07..2.14 rows=3 width=16) (actual time=0.021..0.023 rows=3 loops=1)
         Hash Cond: (t2.id = t3.id)
         ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.007..0.007 rows=3 loops=1)
         ->  Hash  (cost=1.03..1.03 rows=3 width=8) (actual time=0.006..0.006 rows=3 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on t3  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=1)
   ->  Materialize  (cost=0.00..12.10 rows=140 width=524) (actual time=0.001..0.002 rows=3 loops=3)
         ->  Seq Scan on t1  (cost=0.00..11.40 rows=140 width=524) (actual time=0.002..0.003 rows=3 loops=1)
 Planning Time: 0.320 ms
 Execution Time: 0.060 ms
(13 rows)

加入了 /*+ leading(t2 t3 ) */这个提示之后执行计划中表的连接顺序变成了t2->t3->t1。

5、最后测试三种类型的HINT组合使用

还是三张表的关联，查看其执行计划：

testdb=> explain analyze select * from t1,t2,t3 where t1.id=t2.id and t2.id= t3.id;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.20..13.75 rows=1 width=540) (actual time=0.036..0.044 rows=3 loops=1)
   ->  Hash Join  (cost=1.07..13.02 rows=3 width=532) (actual time=0.024..0.027 rows=3 loops=1)
         Hash Cond: (t1.id = t2.id)
         ->  Seq Scan on t1  (cost=0.00..11.40 rows=140 width=524) (actual time=0.005..0.005 rows=3 loops=1)
         ->  Hash  (cost=1.03..1.03 rows=3 width=8) (actual time=0.006..0.007 rows=3 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on t2  (cost=0.00..1.03 rows=3 width=8) (actual time=0.003..0.003 rows=3 loops=1)
   ->  Index Scan using idx_t3_id on t3  (cost=0.13..0.23 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=3)
         Index Cond: (id = t1.id)
 Planning Time: 0.417 ms
 Execution Time: 0.070 ms
(11 rows)

现在加入HINT让t1表作为驱动表使用嵌套的方式与t2表连接，然后两张表的结果集再以排序合并的方式与t3表连接；

testdb=> explain analyze select /*+ nestloop(t1 t2) mergejoin(t1 t2 t3) indexscan(t2) leading(t1 t2 t3) */* from t1,t2,t3 where t1.id=t2.id and t2.id= t3.id ;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=30.96..31.00 rows=1 width=540) (actual time=0.088..0.091 rows=3 loops=1)
   Merge Cond: (t1.id = t3.id)
   ->  Sort  (cost=29.91..29.91 rows=3 width=532) (actual time=0.075..0.075 rows=3 loops=1)
         Sort Key: t1.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.13..29.88 rows=3 width=532) (actual time=0.062..0.070 rows=3 loops=1)
               Join Filter: (t1.id = t2.id)
               Rows Removed by Join Filter: 6
               ->  Seq Scan on t1  (cost=0.00..11.40 rows=140 width=524) (actual time=0.007..0.008 rows=3 loops=1)
               ->  Materialize  (cost=0.13..12.19 rows=3 width=8) (actual time=0.017..0.018 rows=3 loops=3)
                     ->  Index Scan using idx_t2_id on t2  (cost=0.13..12.18 rows=3 width=8) (actual time=0.049..0.051 rows=3 loops=1)
   ->  Sort  (cost=1.05..1.06 rows=3 width=8) (actual time=0.010..0.010 rows=3 loops=1)
         Sort Key: t3.id
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t3  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.003 rows=3 loops=1)
 Planning Time: 0.308 ms
 Execution Time: 0.121 ms
(17 rows)

通过执行计划和SQL的文本可以看到，三种类型的HINT都在同一条SQL中使用上了，并且达到了想要的效果。
更多详细使用方法可以去下面的地址了解
http://pghintplan.osdn.jp/pg_hint_plan.html#restrictions

数据库 postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者