
译者 林锦森 · 沃趣科技数据库技术专家
出品 沃趣科技

我提出了“子操作按先后顺序执行,递归向下”的解释执行计划的基本策略,但是这并不是一个完美的规则,即使遵循基本的“子操作按先后顺序执行”,也很容易对执行过程中发生的事情产生错误判断。
子查询更新
update t1 set
n1 = (
select max(mod100)
from t2
where t2.id = t1.id
),
n2 = (
select max(trunc100)
from t3
where t3.id = t1.id
)
where
id between 101 and 200
;
这个语句有三个直观的步骤。第一我们需要找到更新的行,对每一行,我们执行t2的子查询,之后是t3的子查询。所以执行计划如下所示:
---------------------------------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 101 | 1212 | 610 (34)| 00:00:04 |
| 1 | UPDATE |T1 | | | | |
|* 2 | INDEX RANGE SCAN | T1_I1 | 101 | 1212 | 2 (0)|00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 7 | | |
| 4 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01|
|* 5 | INDEX RANGE SCAN (MIN/MAX)|T2_I1 | 1 | 7 | 2 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 7 | | |
| 7 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01|
|* 8 | INDEX RANGE SCAN (MIN/MAX)|T3_I1 | 1 | 7 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">=101 AND"ID"<=200)
5 - access("T2"."ID"=:B1)
8 - access("T3"."ID"=:B1)
就像你看到的,这个计划遵循了“子操作按先后顺序执行”。第1行的更新操作有三个子操作:第2,3和6行。第一个子操作INDEX RANGE SCAN(索引范围扫描)帮助我们找到需要更新的行的rowid,第二个子操作生成的子计划(3-5行)表示子查询更新列n1,第三个子操作生成的子计划(6-8行)表示子查询更新列n2。
标量子查询
select
n1, n2,
(
select max(mod100)
from t2
where t2.id = t1.id
) new_n1,
(
select max(trunc100)
from t3
where t3.id = t1.id
) new_n2
from
t1
where
t1.id between 101 and 200
;
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 1212 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| T2_I1 | 1 | 7 | 2 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 7 | | |
| 5 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN (MIN/MAX)| T3_I1 | 1 | 7 | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | T1 | 101 | 1212 | 4 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_I1 | 101 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T2"."ID"=:B1)
6 - access("T3"."ID"=:B1)
8 - access("T1"."ID">=101 AND "T1"."ID"<=200)
这个计划中可以看出,第0行的select语句有3个子查询(1,4,7行),但是,当我们直观感觉肯定会发生的跟实际发生的子操作的顺序做比较,我们会看到,最后一个子操作代表了驱动查询语句的起点。当你语句的select列表中有标量子查询时,最后一个子操作会被当作第一个子操作进行调用,剩下的表示标量子查询的子操作,会按照顺序被调用。
BUG演示
select
n1,
decode(mod(n1,4),
0, (
select max(mod100)
from t2
where t2.id = t1.id
),
(
select max(trunc100)
from t3
where t3.id = t1.id
)
)
from
t1
where
t1.id between 101 and 200
;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 808 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX) | T2_I1 | 1 | 7 | 2 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 7 | | |
| 5 | FIRST ROW | | 1 | 7 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN (MIN/MAX)| T3_I1 | 1 | 7 | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | T1 | 101 | 808 | 4 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_I1 | 101 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T2"."ID"=:B1)
6 - access("T3"."ID"=:B1)
8 - access("T1"."ID">=101 AND "T1"."ID"<=200)
这个查询语句中,如果n1是4的倍数则返回t2中的值,否则返回t3的值。我们有理由相信这两个子查询在某些方面是“相同的”,但是当我们查看执行计划时,情况就不一样了。
Rows (1st) Row Source Operation
---------- ---------------------------------------------------
25 SORT AGGREGATE (cr=11 pr=0 pw=0 time=126 us)
25 FIRST ROW (cr=11 pr=0 pw=0 time=83 us cost=2 size=7 card=1)
25 INDEX RANGE SCAN (MIN/MAX) T2_I1 (cr=11 pr=0 pw=0 time=74 us cost=2 size=7 card=1)
75 SORT AGGREGATE (cr=11 pr=0 pw=0 time=241 us)
75 FIRST ROW (cr=11 pr=0 pw=0 time=166 us cost=2 size=7 card=1)
75 INDEX RANGE SCAN (MIN/MAX) T3_I1 (cr=11 pr=0 pw=0 time=140 us cost=2 size=7 card=1)
100 TABLE ACCESS BY INDEX ROWID T1 (cr=13 pr=0 pw=0 time=82 us cost=4 size=808 card=101)
100 INDEX RANGE SCAN T1_I1 (cr=6 pr=0 pw=0 time=654 us cost=2 size=0 card=101)
从这个输出中可以看到,这两个子查询在select语句中是相等的,就像前面的select语句中一样。trace文件中不包含depth信息;它的STAT行只包含id跟parent_id,所以tkprod必须获取深度并给我们提供了正确的执行计划。
子查询分解
with sq2 as (
select /*+ materialize */
t2.id, max(t2.mod100) new_n1
from t2
where t2.id between 101 and 200
group by t2.id
),
sq3 as (
select /*+ materialize */
t3.id, max(t3.trunc100) new_n2
from t3
where t3.id between 101 and 200
group by t3.id
)
select
t1.n1, t1.n2,
sq2.new_n1,
sq3.new_n2
from
t1, sq2, sq3
where
t1.id between 101 and 200
and sq2.id(+) = t1.id
and sq3.id(+) = t1.id
;
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 6464 | 12 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP | | | | |
| 3 | SORT GROUP BY NOSORT | | 101 | 707 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T2_I1 | 101 | 707 | 2 (0)| 00:00:01 |
| 5 | LOAD AS SELECT | SYS_TEMP | | | | |
| 6 | SORT GROUP BY NOSORT | | 101 | 707 | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | T3_I1 | 101 | 707 | 2 (0)| 00:00:01 |
|* 8 | HASH JOIN OUTER | | 101 | 6464 | 8 (0)| 00:00:01 |
|* 9 | HASH JOIN OUTER | | 101 | 3838 | 6 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| T1 | 101 | 1212 | 4 (0)| 00:00:01 |
|*11 | INDEX RANGE SCAN | T1_I1 | 101 | | 2 (0)| 00:00:01 |
|*12 | VIEW | | 101 | 2626 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | SYS_TEMP | 101 | 707 | 2 (0)| 00:00:01 |
|*14 | VIEW | | 101 | 2626 | 2 (0)| 00:00:01 |
| 15 | TABLE ACCESS FULL | SYS_TEMP | 101 | 707 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T2"."ID">=101 AND "T2"."ID"<=200)
7 - access("T3"."ID">=101 AND "T3"."ID"<=200)
8 - access("SQ3"."ID"(+)="T1"."ID")
9 - access("SQ2"."ID"(+)="T1"."ID")
11 - access("T1"."ID">=101 AND "T1"."ID"<=200)
12 - filter("SQ2"."ID"(+)>=101 AND "SQ2"."ID"(+)<=200)
14 - filter("SQ3"."ID"(+)>=101 AND "SQ3"."ID"(+)<=200)
我在子查询分解中使用hint/*+ materialize*/强制Oracle建立了两张内部全局临时表,表中包含t2和t3中我们所需要的结果,之后剩下的代码就是将t1和两个结果集做外连接。实际上我可以去掉hint,Oracle会将“分解的子查询”进行内联复制,从而产生一组类似与两个聚合结果集进行outer hash join的结果集,并保存在会话内存的工作区中。我使用materialize选项只是为了显示带有物化子查询的计划。如果我们将执行计划简化下,那么我们会看到如下:
-----------------------------------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 6464| 12 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT |SYS_TEMP | | | | |
| 5 | LOAD AS SELECT |SYS_TEMP | | | | |
|* 8 | HASH JOIN OUTER | | 101 | 6464| 8 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
我想说这并不是我想看到的顺序,我更想看到第8行(hash join)作为第0行(select statement)的一个子操作。我们可以看到计划中三个主要的阶段都是temptable transormation的子操作,我们先创建了两张临时表,之后做hash join。如果我们展开第2行,我们看到的是聚合表t2;如果我们展开第5行,我们看到的是聚合表t3;如果我们展开第8行,我们看到的是t1和两个临时表之间的hashjoin。
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
|* 8 | HASH JOIN OUTER | | 101 | 6464 | 8 (0)| 00:00:01 |
|* 9 | HASH JOIN OUTER | | 101 | 3838 | 6 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| T1 | 101 | 1212 | 4 (0)| 00:00:01 |
|*12 | VIEW | | 101 | 2626 | 2 (0)| 00:00:01 |
|*14 | VIEW | | 101 | 2626 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
除了计划中执行查询被向右移动了一点外,在我们真正执行查询前可能会生成很多数据,就像我们计划中在select列表中的标量子查询一样,在子查询中仍然遵循基本规则,可以简单地通过折叠多余的文本进行分析,直到可以看到计划中每个操作的第一个操作为止。
结 论
| 译者简介
林锦森·沃趣科技数据库技术专家
沃趣科技数据库工程师,多年从事Oracle数据库,较丰富的故障处理、性能调优、数据迁移及备份恢复经验。
相关链接
MySQL行级别并行复制能并行应用多少个binlog group?
MySQL高可用工具Orchestrator系列二:复制拓扑的发现
MySQL高可用工具Orchestrator系列一:单节点模式安装
Oracle RAC Cache Fusion系列十八:Oracle RAC Statisticsand Wait Events

更多干货,欢迎来撩~




