PostgreSQL绑定变量窥视

PostgreSQL中文社区 2020-02-04

1486

作者介绍

唐成: 网名 osdba，《PostgreSQL修炼之道：从小工到专家》的作者，中启乘数科技公司联合创始人，从业20年，从事PostgreSQL数据库超过10年，拥有十几年数据库、操作系统、存储领域的工作经验，历任过网易研究院技术专家、阿里巴巴高级数据库专家，从事过阿里巴巴PostgreSQL、Greenplum数据库的架构设计和运维。做过数个百TB以上的Greenplum集群的维护和扩容工作，解决过很多PostgreSQL、Greenplum方面的疑难杂症。

1. 绑定变量窥视的原理说明

Oracle DBA都知道，绑定变量窥视功能是Oracle数据库的一个特性，自ORACLE9i版本开始引入，是可以通过参数数“_optim_peek_user_binds”来控制是否开启，默认是开启，即为TRUE。这就意味着，第一次以变量的方式执行某类SQL时，会生成第一个执行计划，后续执行该类SQL语句，即使变量的传入值不同，但因变量窥视的作用，依然会沿用第一次SQL语句执行时生成的执行计划，这种特性非常适用于业务表数据分布比较均匀的场景，执行计划比较稳定。但对于数据分布严重倾斜的业务表，可能会出现错误的执行计划，在极端情况下，会引发严重的性能问题。

当”_optim_peek_user_binds”参数设置为FALSE，即将绑定变量窥视参数特性禁用。那么已经执行过的某类值的执行计划将不会发生变化，一旦传入某个新值时，优化器会自动根据被访问对象的统计、直方图等信息，产生它认为效率最高、成本最低的执行计划。也就是说，在特性关闭的情况下，该类SQL语句可能会产生更优的执行计划。

所以为了让系统的性能不至于大起大落，在很多用户那里会关闭绑定变量窥视的功能。

那么PostgreSQL数据库在绑定变量的执行计划这一块的行为是什么呢？

PostgreSQL数据库的行为有一些复杂：

· 当前5次执行时，每次都会根据实际传入的实际绑定变量新生成执行计划进行执行，即每次都是硬解析，同时会记录这5次的执行计划；

· 当第6次开始执行时，会生成一个通用的执行计划（generic plan），同时与前5次的执行计划进行比较，如果比较的结果是通用执行计划不比前5次的执行计划差，以后就会固定把这个通用的执行计划固定下来，这之后即使传入的值发生变化后，执行计划也不再变化。这就相当于Oracle打开了绑定变量窥视的功能。

· 当然，当第6次开始执行时，如果通用的执行计划（generic plan）比前5次的某一个执行计划差，则以后则每次都重新生成执行计划，即以后永远都是硬解析了。

从上面原理可以看出，PostgreSQL数据库能否不走硬解析，与前5次执行时传入的实际值有很大的关系，可以想象如果前5次执行时都是一个固定的值，第6次执行时的通用执行计划与前5次又一样，这时执行计划就会固定，如果以后传进来的值可以生成更好的执行计划，也不会生成了，这时可能会导致比较大的问题，这与Oracle打开了绑定变量窥视产生了一样的问题。只是因为PostgreSQL因为有先执行5次，然后第6次比较的机制，让这个问题出现的概率低了很多，但实际上还是会出现的。

当然，如果每次都是重新生成执行计划，对于高并发，会降低一些性能。实际上，对于一些重要的系统，每次重新生成执行计划，会更好一些，因为这种方式防止了系统的性能大起大落。

目前，网上很少有文章介绍这个原理，即使有也是把这个原理介绍的不清楚。

2. 实际测试

2.1 造测试表和数据

下面我们实际测试，来更深的理解这个原理。

1.create table test01(id serial, t text);
2.insert into test01(t) select 'tang' from generate_series(1, 1000000);
3.insert into test01(t) select 'osdba' from generate_series(1, 2);
4.CREATE INDEX idx_test01_t ON test01(t);
5.analyze test01;

上面的例子中我们创建了不均匀的数据，即为“tang”的数据是100万，而为“osdba”的数据是2条。

如果我们按常量来查，执行计划会走正确：

1.osdba=# explain SELECT count(*) FROM test01 WHERE t = 'tang';
2.                                        QUERY PLAN
3.------------------------------------------------------------------------------------------
4. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
5.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
6.         Workers Planned: 2
7.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
8.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
9.                     Filter: (t = 'tang'::text)
10.(6 rows)
11.
12.Time: 1.532 ms
13.
14.osdba=# explain SELECT count(*) FROM test01 WHERE t = 'osdba';
15.                                      QUERY PLAN
16.--------------------------------------------------------------------------------------
17. Aggregate  (cost=4.45..4.46 rows=1 width=8)
18.   ->  Index Only Scan using idx_test01_t on test01  (cost=0.42..4.44 rows=1 width=0)
19.         Index Cond: (t = 'osdba'::text)
20.(3 rows)
21.
22.Time: 1.484 ms

上面可以看到，当按“tang”来查是，走的是全表扫描，而按“osdba”查询时走的是索引，说明执行计划都是正确的。

2.2 按绑定变量的第一次测试

下面我们按绑定变量的方式执行：

1.PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;

下面具体看：

1.osdba-mac:~ osdba$ psql
2.psql (10.5)
3.Type "help" for help.
4.
5.osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
6.PREPARE
7.osdba=# explain EXECUTE myplan('tang');
8.                                        QUERY PLAN
9.------------------------------------------------------------------------------------------
10. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
11.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
12.         Workers Planned: 2
13.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
14.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
15.                     Filter: (t = 'tang'::text)
16.(6 rows)
17.
18.osdba=# explain EXECUTE myplan('tang');
19.                                        QUERY PLAN
20.------------------------------------------------------------------------------------------
21. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
22.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
23.         Workers Planned: 2
24.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
25.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
26.                     Filter: (t = 'tang'::text)
27.(6 rows)
28.
29.osdba=# explain EXECUTE myplan('tang');
30.                                        QUERY PLAN
31.------------------------------------------------------------------------------------------
32. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
33.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
34.         Workers Planned: 2
35.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
36.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
37.                     Filter: (t = 'tang'::text)
38.(6 rows)
39.
40.osdba=# explain EXECUTE myplan('tang');
41.                                        QUERY PLAN
42.------------------------------------------------------------------------------------------
43. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
44.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
45.         Workers Planned: 2
46.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
47.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
48.                     Filter: (t = 'tang'::text)
49.(6 rows)
50.
51.osdba=# explain EXECUTE myplan('tang');
52.                                        QUERY PLAN
53.------------------------------------------------------------------------------------------
54. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
55.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
56.         Workers Planned: 2
57.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
58.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
59.                     Filter: (t = 'tang'::text)
60.(6 rows)
61.
62.osdba=# explain EXECUTE myplan('tang');
63.                                        QUERY PLAN
64.------------------------------------------------------------------------------------------
65. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
66.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
67.         Workers Planned: 2
68.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
69.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
70.                     Filter: (t = $1)
71.(6 rows)

注意上面执行中的第6次的执行计划和第5次的执行计划发生了变化，前5次都是“Filter: (t = ‘tang’::text)”，而第6次变成了“Filter: (t = $1)”，这说明执行计划变成了通用执行计划，这时我们把传进去的值改成“osdba”，发现也会是走全表扫描了，不会走索引了，这时的执行计划就错了：

1.osdba=# explain analyze EXECUTE myplan('osdba');
2.                                                               QUERY PLAN
3.----------------------------------------------------------------------------------------------------------------------------------------
4. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8) (actual time=114.069..114.069 rows=1 loops=1)
5.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8) (actual time=113.957..114.865 rows=3 loops=1)
6.         Workers Planned: 2
7.         Workers Launched: 2
8.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8) (actual time=106.088..106.088 rows=1 loops=3)
9.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0) (actual time=106.072..106.072 rows=1 loops=3)
10.                     Filter: (t = $1)
11.                     Rows Removed by Filter: 333333
12. Planning time: 0.035 ms
13. Execution time: 115.044 ms
14.(10 rows)

2.3 按绑定变量的第二次测试

前面的测试时，我们前5次执行时传进去的值都是“tang”，我们这一次让前5次中四次传进去的值是“tang”，有一次是“osdba”:

1.osdba-mac:~ osdba$ psql
2.psql (10.5)
3.Type "help" for help.
4.
5.osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
6.PREPARE
7.osdba=# explain EXECUTE myplan('tang');
8.                                        QUERY PLAN
9.------------------------------------------------------------------------------------------
10. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
11.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
12.         Workers Planned: 2
13.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
14.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
15.                     Filter: (t = 'tang'::text)
16.(6 rows)
17.
18.osdba=# explain EXECUTE myplan('tang');
19.                                        QUERY PLAN
20.------------------------------------------------------------------------------------------
21. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
22.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
23.         Workers Planned: 2
24.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
25.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
26.                     Filter: (t = 'tang'::text)
27.(6 rows)
28.
29.osdba=# explain EXECUTE myplan('tang');
30.                                        QUERY PLAN
31.------------------------------------------------------------------------------------------
32. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
33.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
34.         Workers Planned: 2
35.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
36.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
37.                     Filter: (t = 'tang'::text)
38.(6 rows)
39.
40.osdba=# explain EXECUTE myplan('tang');
41.                                        QUERY PLAN
42.------------------------------------------------------------------------------------------
43. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
44.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
45.         Workers Planned: 2
46.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
47.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
48.                     Filter: (t = 'tang'::text)
49.(6 rows)
50.
51.osdba=# explain EXECUTE myplan('osdba');
52.                                      QUERY PLAN
53.--------------------------------------------------------------------------------------
54. Aggregate  (cost=4.45..4.46 rows=1 width=8)
55.   ->  Index Only Scan using idx_test01_t on test01  (cost=0.42..4.44 rows=1 width=0)
56.         Index Cond: (t = 'osdba'::text)
57.(3 rows)

后面我们无论再怎么执行固定的值，发现每次都是生成新的执行计划了：

1.osdba=# explain EXECUTE myplan('tang');
2.                                        QUERY PLAN
3.------------------------------------------------------------------------------------------
4. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
5.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
6.         Workers Planned: 2
7.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
8.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
9.                     Filter: (t = 'tang'::text)
10.(6 rows)
11.
12.osdba=# explain EXECUTE myplan('tang');
13.                                        QUERY PLAN
14.------------------------------------------------------------------------------------------
15. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
16.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
17.         Workers Planned: 2
18.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
19.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
20.                     Filter: (t = 'tang'::text)
21.(6 rows)
22....
23....
24....
25.
26.osdba=# explain EXECUTE myplan('osdba');
27.                                      QUERY PLAN
28.--------------------------------------------------------------------------------------
29. Aggregate  (cost=4.45..4.46 rows=1 width=8)
30.   ->  Index Only Scan using idx_test01_t on test01  (cost=0.42..4.44 rows=1 width=0)
31.         Index Cond: (t = 'osdba'::text)
32.(3 rows)
33....
34....
35.
36.osdba=# explain EXECUTE myplan('tang');
37.                                        QUERY PLAN
38.------------------------------------------------------------------------------------------
39. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
40.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
41.         Workers Planned: 2
42.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
43.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
44.                     Filter: (t = 'tang'::text)
45.(6 rows)

上面的演示是前5次中前四次传进去的值是“tang”，最后一次是“osdba”，实际上只要前五次中，只要任意有1次或多次传进去的是“osdba”，不一定要求最后一次是“osdba”时，都不会走通用的执行计划，这个结果大家可以测试。

这就验证了我们前面的理论。

3. PostgreSQL 12的plan_cache_mode配置参数

在PostgreSQL11及一下的版本中，因为绑定变量窥视，虽然比Oracle出现的概率低，但还是有一定的概率导致执行计划走错。那么在PostgreSQL中是否也有类似Oracle的隐含参数把绑定变量窥视关掉的功能？答案是PostgreSQL12提供了这个功能。

在PostgreSQL 12提供了plan_cache_mode配置参数，可以取以下三个值：

· auto: 这时默认值，即默认情况下与PostgreSQL11及以下版本相同的行为。

· force_custom_plan: 相当于关闭绑定变量窥视，永远进行硬解析。

· force_generic_plan: 走通用的固定执行计划（generic plan）

所以对于一些非常重要的系统，可以把“plan_cache_mode”配置成“force_custom_plan”，避免执行计划的错误，如下所示：

1.osdba-mac:pgdata12 osdba$ psql
2.psql (12.1)
3.Type "help" for help.
4.
5.osdba=# PREPARE myplan(text) AS SELECT count(*) FROM test01 WHERE t = $1;
6.PREPARE
7.osdba=# set plan_cache_mode to force_custom_plan;
8.SET
9.osdba=# explain EXECUTE myplan('osdba');
10.                                      QUERY PLAN
11.--------------------------------------------------------------------------------------
12. Aggregate  (cost=4.45..4.46 rows=1 width=8)
13.   ->  Index Only Scan using idx_test01_t on test01  (cost=0.42..4.44 rows=1 width=0)
14.         Index Cond: (t = 'osdba'::text)
15.(3 rows)
16.
17.osdba=# explain EXECUTE myplan('tang');
18.                                        QUERY PLAN
19.------------------------------------------------------------------------------------------
20. Finalize Aggregate  (cost=12656.23..12656.24 rows=1 width=8)
21.   ->  Gather  (cost=12656.01..12656.22 rows=2 width=8)
22.         Workers Planned: 2
23.         ->  Partial Aggregate  (cost=11656.01..11656.02 rows=1 width=8)
24.               ->  Parallel Seq Scan on test01  (cost=0.00..10614.34 rows=416668 width=0)
25.                     Filter: (t = 'tang'::text)
26.(6 rows)

postgresql

文章转载自PostgreSQL中文社区，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。