即时编译（JIT）

开源软件联盟PostgreSQL分会 2022-03-25

476

文章转载自公众号：PostgreSQL数据库工作学习随笔

即时（Just-In-Time，JIT）编译是将某种形式的解释程序计算转变成原生程序的过程，并且这一过程是在运行时完成的。例如，与使用能够计算任意SQL表达式的通用代码来计算一个特定的SQL谓词（如WHERE a.col = 3）不同，可以产生一个专门针对该表达式的函数并且可以由CPU原生执行，从而得到加速。

当使用--with-llvm编译PostgreSQL后，PostgreSQL内建支持用LLVM执行JIT编译。

JIT加速的操作

当前，PostgreSQL的JIT实现支持对表达式计算以及元组拆解的加速。未来可能有更多其他操作采用这种技术加速。

表达式计算被用来计算WHERE子句、目标列表、聚集以及投影。通过为每一种情况生成专门的代码来实现加速。

元组拆解是把一个磁盘上的元组转换成其在内存中表示的过程。通过创建一个专门针对该表布局和要被抽取的列数的函数来实现加速。

配置JIT

--with-llvm

支持基于LLVM的JIT编译。这需要安装LLVM库。当前LLVM的最低要求版本是3.9。

llvm-config 可用于查找所需的编译选项。llvm-config会在PATH 上搜索所有受支持版本的llvm-config-$major-$minor。如果这不会产生所需的程序，请使用LLVM_CONFIG指定正确的 llvm-config的路径。例如：

./configure ... --with-llvm LLVM_CONFIG='/path/to/llvm/bin/llvm-config'

LLVM支持需要兼容的clang编译器（必要时使用CLANG环境变量指定）和有效的C++编译器（必要时使用使用CXX环境变量指定）。

[root@k8s2 postgresql-12.10]# yum install llvm5.0 llvm5.0-devel clang[root@k8s2 postgresql-12.10]# ./configure --prefix=/opt/pg12 --with-llvm LLVM_CONFIG='/usr/bin/llvm-config-5.0-64'[root@k8s2 postgresql-12.10]# make world[root@k8s2 postgresql-12.10]# make install-world[pg12@k8s2 ~]$ initdb -D /opt/pg12/pgdata --data-checksums -A scram-sha-256 -W[pg12@k8s2 ~]$ pg_ctl -D /opt/pg12/pgdata -l logfile start

什么时候使用JIT

JIT编译主要可以让长时间运行的CPU密集型的查询受益。对于短查询，执行JIT编译增加的开销常常比它节省的时间还要多。

为了判断是否应该使用JIT编译，会用到一个查询的总的估计代价。

查询的估计代价将与jit_above_cost的设置进行比较。如果代价更高，JIT编译将被执行。然后需要两个进一步的决定。

首先，如果估计代价超过jit_inline_above_cost的设置，该查询中使用的短函数和操作符都将被内联。其次，如果估计代价超过jit_optimize_above_cost的设置，会应用昂贵的优化来改进产生的代码。这些选项中的每一种都会增加JIT编译的开销，但是可以可观地降低查询执行时间。

配置变量jit决定启用或者禁用JIT编译。如果它被启用，配置变量jit_above_cost、jit_inline_above_cost以及jit_optimize_above_cost判断是否要为一个查询执行JIT编译以及在执行中花费多大的努力。

[pg12@k8s2 ~]$ cat /opt/pg12/pgdata/postgresql.conf |grep jit#jit_above_cost = 100000                # perform JIT compilation if available#jit_inline_above_cost = 500000         # inline small functions if query is#jit_optimize_above_cost = 500000       # use expensive JIT optimizations if#jit = on                               # allow JIT compilation#jit_provider = 'llvmjit'               # JIT library to use[pg12@k8s2 ~]$

测试

创建测试表postgres=# CREATE TABLE t_jit AS SELECT (random()*10000)::int AS x, (random()*100000)::int AS y,(random()*1000000)::int AS z FROM generate_series(1, 50000000) AS id;查看不使用jit的执行计划postgres=# set jit to off;SETpostgres=# explain (analyze , buffers , costs , timing) select count(*) from t_jit;                                                                   QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate  (cost=531688.59..531688.60 rows=1 width=8) (actual time=6404.323..6410.829 rows=1 loops=1)   Buffers: shared hit=16194 read=254077   ->  Gather  (cost=531688.38..531688.58 rows=2 width=8) (actual time=6404.131..6410.818 rows=3 loops=1)         Workers Planned: 2         Workers Launched: 2         Buffers: shared hit=16194 read=254077         ->  Partial Aggregate  (cost=530688.38..530688.39 rows=1 width=8) (actual time=6363.242..6363.243 rows=1 loops=3)               Buffers: shared hit=16194 read=254077               ->  Parallel Seq Scan on t_jit  (cost=0.00..478604.90 rows=20833390 width=0) (actual time=0.091..4547.032 rows=16666667 loops=3)                     Buffers: shared hit=16194 read=254077 Planning Time: 0.198 ms Execution Time: 6410.929 ms(12 rows)postgres=#查看好用jit的执行计划postgres=# set jit to on;SETpostgres=# explain (analyze , buffers , costs , timing) select count(*) from t_jit;                                                                   QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------ Finalize Aggregate  (cost=531688.59..531688.60 rows=1 width=8) (actual time=4927.375..4933.406 rows=1 loops=1)   Buffers: shared hit=16208 read=254063   ->  Gather  (cost=531688.38..531688.58 rows=2 width=8) (actual time=4927.085..4933.384 rows=3 loops=1)         Workers Planned: 2         Workers Launched: 2         Buffers: shared hit=16208 read=254063         ->  Partial Aggregate  (cost=530688.38..530688.39 rows=1 width=8) (actual time=4906.512..4906.513 rows=1 loops=3)               Buffers: shared hit=16208 read=254063               ->  Parallel Seq Scan on t_jit  (cost=0.00..478604.90 rows=20833390 width=0) (actual time=0.047..3677.157 rows=16666667 loops=3)                     Buffers: shared hit=16208 read=254063 Planning Time: 0.082 ms JIT:   Functions: 8   Options: Inlining true, Optimization true, Expressions true, Deforming true   Timing: Generation 1.535 ms, Inlining 0.397 ms, Optimization 18.182 ms, Emission 21.841 ms, Total 41.955 ms Execution Time: 4933.971 ms(16 rows)postgres=#在这种情况下，您可以看到查询比以前快 20％以上，这已经是很不错了。