SQL优化 -- KingbaseES 两表关联 Update的两种写法与性能差异

原创金仓数据库 2023-07-24

3195

熟悉 Oracle 的人都知道，对于两表的关联更新，其执行计划主要是类似 subplan 方式。KingbaseES 和 Postgresql 对于update 支持两种方式：subplan or join，对应语法上采用两种不同的写法。

下面以例子的形式展示两种写法及性能上的差异。这些例子同时通过KingbaseES V8R6和 Postgresql 12.3 环境验证。

一、准备测试数据

create table t1(id1 integer,name1 varchar(200));
create table t2(id2 integer,name2 varchar(200));

insert into t1 select * from (select generate_series(1,1000000),repeat('a',50)) as a order by random();
insert into t2 select * from (select generate_series(1,1000000),repeat('b',50)) as a order by random();

create index ind_t1_id1 on t1(id1);
create index ind_t2_id2 on t2(id2);

analyze t1;
analyze t2;

二、性能测试

1、语法一

逐条处理 t1 表的每条记录，对于t1表的每条记录，都需要访问t2表。

test=# explain analyze update t1 set name1=(select name2 from t2 where id1=id2);
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Update on t1  (cost=0.00..8462810.00 rows=1000000 width=428) (actual time=13072.720..13072.721 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..8462810.00 rows=1000000 width=428) (actual time=0.035..6620.732 rows=1000000 loops=1)
         SubPlan 1
           ->  Index Scan using ind_t2_id2 on t2  (cost=0.42..8.44 rows=1 width=51) (actual time=0.006..0.006 rows=1 loops=1000000)
                 Index Cond: (id2 = t1.id1)
 Planning Time: 0.116 ms
 Execution Time: 13072.780 ms
(7 rows)

2、语法二

采用hash join，大批量的update 效率更高。

test=# explain analyze update t1 set name1=name2 from t2 where id1=id2;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Update on t1  (cost=37693.00..98122.00 rows=1000000 width=67) (actual time=8197.309..8197.312 rows=0 loops=1)
   ->  Hash Join  (cost=37693.00..98122.00 rows=1000000 width=67) (actual time=349.817..1633.896 rows=1000000 loops=1)
         Hash Cond: (t2.id2 = t1.id1)
         ->  Seq Scan on t2  (cost=0.00..20310.00 rows=1000000 width=61) (actual time=0.021..191.730 rows=1000000 loops=1)
         ->  Hash  (cost=20310.00..20310.00 rows=1000000 width=10) (actual time=348.798..348.798 rows=1000000 loops=1)
               Buckets: 131072  Batches: 16  Memory Usage: 3594kB
               ->  Seq Scan on t1  (cost=0.00..20310.00 rows=1000000 width=10) (actual time=0.034..153.882 rows=1000000 loops=1)
 Planning Time: 0.780 ms
 Execution Time: 8197.543 ms

3、结论

对于大批量数据update，基于hash join 的update方法效率上要高效很多。

PS：不管哪种写法，KingbaseES 都不会去额外判断 update 前后值是否一致，也就是说，不管update前后值是否相同，update操作都会进行。Oracle 也是同样机制。

三、Oracle Update语句的执行计划

SQL> update t1 set name1=(select name2 from t2 where id1=id2);
1234030 rows updated.

Execution Plan
----------------------------------------------------------
Plan hash value: 2429794062
---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                     |            |  1234K|    45M|  6172K (20)| 00:04:02 |
|   1 |  UPDATE                              | T1         |       |       |            |          |
|   2 |   TABLE ACCESS FULL                  | T1         |  1234K|    45M|  2062   (1)| 00:00:01 |
|   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2         |     1 |    39 |     4   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | IND_T2_ID2 |     1 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

从执行计划可以看出，SQL 语句采取的是类似 KingbaseES subplan 的连接方式，对于主表的每一行，子表都要访问一次。

PS：Oracle 还有一种内联视图的更新，可以走join 的方式，但要求关联的两张表必须有主键。类似SQL 如下：

update (select t1.object_name t1_object_name ,t2.object_name t2_object_name from t1 ,t2 where t1.object_id=t2.object_id ) set t1_object_name=t2_object_name;

人大金仓