知识点滴 -- 人大金仓KingbaseES 分区表哈希智能算法

原创金仓数据库 2023-08-15

5749

对于两个分区大表的连接，其性能与hash table的大小直接相关。如果hash table数据量能够被work_mem缓存，那性能是最优的。因此，如果能够利用分区的特点按分区进行hash join，那对于性能是非常大的提升。KingbaseES 有两个参数 enable_partitionwise_join ， enable_partitionwise_aggregate 分别用于控制分区智能连接和聚集，默认这两个参数是 off 的。开启这两个参数，对于采用hash join 或 hash aggregate 的执行方式有性能提升。

以下我们举例来看该参数对于执行计划的影响：

一、分区哈希连接

1、准备数据

create table t1(id1 integer,name1 text) partition by hash(id1) partitions 4;
insert into t1 select generate_series(1,10000000),'abc'||generate_series(1,10000000);

create table t2(id2 integer,name1 text) partition by hash(id2) partitions 4;
insert into t2 select generate_series(1,3000000),'abc'||generate_series(1,3000000);

2、查看执行计划

不使用智能连接：把所有分区的数据取回，以整个表为单位进行hash 及 hash join。因为innertable hash操作是整表，需要的内存更大。下例红色部分 Batches : 4，说明hash 的部分结果（3个batch）被临时存放在。

test=# show enable_partitionwise_join ;
 enable_partitionwise_join
---------------------------
 off
(1 行记录)

test=# explain analyze select count(*) from t1,t2 where id1=id2;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5625429721.00..5625429721.01 rows=1 width=8) (actual time=8090.917..8090.922 rows=1 loops=1)
   ->  Hash Join  (cost=110526.00..5250429721.00 rows=150000000000 width=0) (actual time=1032.591..7943.064 rows=3000000 loops=1)
         Hash Cond: (t1_p0.id1 = t2_p0.id2)
         ->  Append  (cost=0.00..204350.00 rows=10000000 width=4) (actual time=0.022..1666.767 rows=10000000 loops=1)
               ->  Seq Scan on t1_p0  (cost=0.00..38575.41 rows=2499241 width=4) (actual time=0.022..284.595 rows=2499241 loops=1)
               ->  Seq Scan on t1_p1  (cost=0.00..38593.88 rows=2500388 width=4) (actual time=0.010..282.833 rows=2500388 loops=1)
               ->  Seq Scan on t1_p2  (cost=0.00..38568.65 rows=2498765 width=4) (actual time=0.013..282.186 rows=2498765 loops=1)
               ->  Seq Scan on t1_p3  (cost=0.00..38612.06 rows=2501606 width=4) (actual time=0.011..275.778 rows=2501606 loops=1)
         ->  Hash  (cost=61307.00..61307.00 rows=3000000 width=4) (actual time=1029.502..1029.504 rows=3000000 loops=1)
               Buckets: 2097152  Batches: 4  Memory Usage: 42727kB
               ->  Append  (cost=0.00..61307.00 rows=3000000 width=4) (actual time=0.019..452.904 rows=3000000 loops=1)
                     ->  Seq Scan on t2_p0  (cost=0.00..11560.96 rows=748996 width=4) (actual time=0.018..72.361 rows=748996 loops=1)
                     ->  Seq Scan on t2_p1  (cost=0.00..11591.35 rows=750935 width=4) (actual time=0.014..72.146 rows=750935 loops=1)
                     ->  Seq Scan on t2_p2  (cost=0.00..11560.59 rows=748959 width=4) (actual time=0.017..74.721 rows=748959 loops=1)
                     ->  Seq Scan on t2_p3  (cost=0.00..11594.10 rows=751110 width=4) (actual time=0.010..73.672 rows=751110 loops=1)
 Planning Time: 0.107 ms
 Execution Time: 8091.972 ms
(17 行记录)

使用智能连接：因为两张表的分区模式是一样的（hash 分区，分区列的数据类型与分区数量相同），能够保证相同的数据hash到对应的分区。因此，只要对应的分区进行连接就可以。由于单个分区的数据量更少，更有可能使用内存排序。本例红色部分batches : 1，也就是所有的hash 的结果都在内存。

test=# set enable_partitionwise_join=on;
SET
test=# explain analyze select count(*) from t1,t2 where id1=id2;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=328157.00..328157.01 rows=1 width=8) (actual time=5805.663..5805.669 rows=1 loops=1)
   ->  Append  (cost=20923.41..320657.00 rows=3000000 width=0) (actual time=269.617..5665.055 rows=3000000 loops=1)
         ->  Hash Join  (cost=20923.41..76360.93 rows=748996 width=0) (actual time=269.616..1409.434 rows=748996 loops=1)
               Hash Cond: (t1_p0.id1 = t2_p0.id2)
               ->  Seq Scan on t1_p0  (cost=0.00..38575.41 rows=2499241 width=4) (actual time=0.045..294.969 rows=2499241 loops=1)
               ->  Hash  (cost=11560.96..11560.96 rows=748996 width=4) (actual time=265.047..265.048 rows=748996 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 34524kB
                     ->  Seq Scan on t2_p0  (cost=0.00..11560.96 rows=748996 width=4) (actual time=0.097..81.051 rows=748996 loops=1)
         ->  Hash Join  (cost=20978.04..76457.72 rows=750935 width=0) (actual time=237.966..1375.976 rows=750935 loops=1)
               Hash Cond: (t1_p1.id1 = t2_p1.id2)
               ->  Seq Scan on t1_p1  (cost=0.00..38593.88 rows=2500388 width=4) (actual time=0.030..286.183 rows=2500388 loops=1)
               ->  Hash  (cost=11591.35..11591.35 rows=750935 width=4) (actual time=235.646..235.646 rows=750935 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 34593kB
                     ->  Seq Scan on t2_p1  (cost=0.00..11591.35 rows=750935 width=4) (actual time=0.025..69.625 rows=750935 loops=1)
         ->  Hash Join  (cost=20922.58..76351.19 rows=748959 width=0) (actual time=245.015..1339.569 rows=748959 loops=1)
               Hash Cond: (t1_p2.id1 = t2_p2.id2)
               ->  Seq Scan on t1_p2  (cost=0.00..38568.65 rows=2498765 width=4) (actual time=0.028..281.130 rows=2498765 loops=1)
               ->  Hash  (cost=11560.59..11560.59 rows=748959 width=4) (actual time=242.801..242.801 rows=748959 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 34523kB
                     ->  Seq Scan on t2_p2  (cost=0.00..11560.59 rows=748959 width=4) (actual time=0.024..70.538 rows=748959 loops=1)
         ->  Hash Join  (cost=20982.97..76487.16 rows=751110 width=0) (actual time=255.132..1365.909 rows=751110 loops=1)
               Hash Cond: (t1_p3.id1 = t2_p3.id2)
               ->  Seq Scan on t1_p3  (cost=0.00..38612.06 rows=2501606 width=4) (actual time=0.042..288.140 rows=2501606 loops=1)
               ->  Hash  (cost=11594.10..11594.10 rows=751110 width=4) (actual time=252.233..252.233 rows=751110 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 34599kB
                     ->  Seq Scan on t2_p3  (cost=0.00..11594.10 rows=751110 width=4) (actual time=0.060..71.905 rows=751110 loops=1)
 Planning Time: 0.148 ms
 Execution Time: 5812.038 ms
(28 行记录)

3、分区哈希连接前提

关联表的hash分区模式必须完全相同：列类型（完全相同，integer 与 bigint 实测无法使用哈希智能连接）、长度必须相同、hash 分区数量必须相同。
对于hash 分区表，分区数量越多，每个分区数据量越少，在连接时，work_mem 能容纳单个分区数据的概率越高，越可能采用内存hash。

二、智能哈希聚集

1、构建测试数据

create table t1(id1 integer,name1 text) partition by hash(id1) partitions 4;
insert into t1 select generate_series(1,100000),'abc'||generate_series(1,100000);
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

2、非智能聚集的场景

test=# set enable_partitionwise_aggregate=off;
SET
test=# explain analyze select id1 , count(*) from t1 group by id1;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=325567.00..326568.30 rows=100130 width=12) (actual time=6533.552..6567.385 rows=100000 loops=1)
   Group Key: t1_p0.id1
   ->  Append  (cost=0.00..261567.00 rows=12800000 width=4) (actual time=0.358..2707.037 rows=12800000 loops=1)
         ->  Seq Scan on t1_p0  (cost=0.00..49640.28 rows=3216128 width=4) (actual time=0.357..422.919 rows=3216128 loops=1)
         ->  Seq Scan on t1_p1  (cost=0.00..49347.84 rows=3197184 width=4) (actual time=0.037..457.075 rows=3197184 loops=1)
         ->  Seq Scan on t1_p2  (cost=0.00..49334.88 rows=3196288 width=4) (actual time=0.054..430.594 rows=3196288 loops=1)
         ->  Seq Scan on t1_p3  (cost=0.00..49244.00 rows=3190400 width=4) (actual time=0.056..510.354 rows=3190400 loops=1)
 Planning Time: 0.111 ms
 Execution Time: 6571.492 ms
(9 行记录)

3、智能聚集的场景

test=# set enable_partitionwise_aggregate=on;
SET
test=# explain analyze select id1 , count(*) from t1 group by id1;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=65720.92..263069.22 rows=100148 width=12) (actual time=892.267..3502.275 rows=100000 loops=1)
   ->  HashAggregate  (cost=65720.92..65971.54 rows=25062 width=12) (actual time=892.266..898.120 rows=25126 loops=1)
         Group Key: t1_p0.id1
         ->  Seq Scan on t1_p0  (cost=0.00..49640.28 rows=3216128 width=4) (actual time=0.086..343.870 rows=3216128 loops=1)
   ->  HashAggregate  (cost=65333.76..65585.86 rows=25210 width=12) (actual time=828.774..833.766 rows=24978 loops=1)
         Group Key: t1_p1.id1
         ->  Seq Scan on t1_p1  (cost=0.00..49347.84 rows=3197184 width=4) (actual time=0.041..325.084 rows=3197184 loops=1)
   ->  HashAggregate  (cost=65316.32..65566.86 rows=25054 width=12) (actual time=835.283..842.753 rows=24971 loops=1)
         Group Key: t1_p2.id1
         ->  Seq Scan on t1_p2  (cost=0.00..49334.88 rows=3196288 width=4) (actual time=0.089..317.460 rows=3196288 loops=1)
   ->  HashAggregate  (cost=65196.00..65444.22 rows=24822 width=12) (actual time=918.379..922.312 rows=24925 loops=1)
         Group Key: t1_p3.id1
         ->  Seq Scan on t1_p3  (cost=0.00..49244.00 rows=3190400 width=4) (actual time=0.069..348.231 rows=3190400 loops=1)
 Planning Time: 0.081 ms
 Execution Time: 3505.180 ms
(15 行记录)