熟悉 Oracle 的人都知道,对于两表的关联更新,其执行计划主要是类似 subplan 方式。KingbaseES 和 Postgresql 对于update 支持两种方式:subplan or join,对应语法上采用两种不同的写法。
下面以例子的形式展示两种写法及性能上的差异。这些例子同时通过KingbaseES V8R6和 Postgresql 12.3 环境验证。
一、准备测试数据
create table t1(id1 integer,name1 varchar(200));
create table t2(id2 integer,name2 varchar(200));
insert into t1 select * from (select generate_series(1,1000000),repeat('a',50)) as a order by random();
insert into t2 select * from (select generate_series(1,1000000),repeat('b',50)) as a order by random();
create index ind_t1_id1 on t1(id1);
create index ind_t2_id2 on t2(id2);
analyze t1;
analyze t2;二、性能测试
1、语法一
逐条处理 t1 表的每条记录,对于t1表的每条记录,都需要访问t2表。
test=# explain analyze update t1 set name1=(select name2 from t2 where id1=id2);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Update on t1 (cost=0.00..8462810.00 rows=1000000 width=428) (actual time=13072.720..13072.721 rows=0 loops=1)
-> Seq Scan on t1 (cost=0.00..8462810.00 rows=1000000 width=428) (actual time=0.035..6620.732 rows=1000000 loops=1)
SubPlan 1
-> Index Scan using ind_t2_id2 on t2 (cost=0.42..8.44 rows=1 width=51) (actual time=0.006..0.006 rows=1 loops=1000000)
Index Cond: (id2 = t1.id1)
Planning Time: 0.116 ms
Execution Time: 13072.780 ms
(7 rows)2、语法二
采用hash join,大批量的update 效率更高。
test=# explain analyze update t1 set name1=name2 from t2 where id1=id2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Update on t1 (cost=37693.00..98122.00 rows=1000000 width=67) (actual time=8197.309..8197.312 rows=0 loops=1)
-> Hash Join (cost=37693.00..98122.00 rows=1000000 width=67) (actual time=349.817..1633.896 rows=1000000 loops=1)
Hash Cond: (t2.id2 = t1.id1)
-> Seq Scan on t2 (cost=0.00..20310.00 rows=1000000 width=61) (actual time=0.021..191.730 rows=1000000 loops=1)
-> Hash (cost=20310.00..20310.00 rows=1000000 width=10) (actual time=348.798..348.798 rows=1000000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3594kB
-> Seq Scan on t1 (cost=0.00..20310.00 rows=1000000 width=10) (actual time=0.034..153.882 rows=1000000 loops=1)
Planning Time: 0.780 ms
Execution Time: 8197.543 ms3、结论
对于大批量数据update,基于hash join 的update方法效率上要高效很多。
PS:不管哪种写法,KingbaseES 都不会去额外判断 update 前后值是否一致,也就是说,不管update前后值是否相同,update操作都会进行。Oracle 也是同样机制。
三、Oracle Update语句的执行计划
SQL> update t1 set name1=(select name2 from t2 where id1=id2); 1234030 rows updated. Execution Plan ---------------------------------------------------------- Plan hash value: 2429794062 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1234K| 45M| 6172K (20)| 00:04:02 | | 1 | UPDATE | T1 | | | | | | 2 | TABLE ACCESS FULL | T1 | 1234K| 45M| 2062 (1)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 39 | 4 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IND_T2_ID2 | 1 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------
从执行计划可以看出,SQL 语句采取的是类似 KingbaseES subplan 的连接方式,对于主表的每一行,子表都要访问一次。
PS:Oracle 还有一种内联视图的更新,可以走join 的方式,但要求关联的两张表必须有主键。类似SQL 如下:
update (select t1.object_name t1_object_name ,t2.object_name t2_object_name from t1 ,t2 where t1.object_id=t2.object_id ) set t1_object_name=t2_object_name;
最后修改时间:2024-08-13 14:00:42
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




