原文作者:Jonathan lewis
原文地址:https://jonathanlewis.wordpress.com/2020/04/29/execution-plans-3/
译文如下:
几天前,我们讨论了一个执行计划,该计划显示出一个反联接(它是对“not exists”子查询和一个半连接的转换后的结果)其处理子查询的方式的一些变化,这里甚至引起一些欺骗。乍一看就好像它是转换“exists”子查询的结果。
为了更清晰的剖析问题,我将查询通过一组表进行模拟,以便允许我重新生成执行计划,也为了我可以找到“最终的代码块” (outline_leafs)。为了达到我们的目标,这里我发布了创建这些表的脚本,以及生成的三个执行计划。
rem
rem Script: anti_semi.sql
rem Author: Jonathan Lewis
rem Dated: Apr 2020
rem
rem Last tested
rem 19.3.0.0
rem 12.1.0.2
rem
create table ip_spells
partition by range (admission_dttm) interval (numtoyminterval(1,'MONTH')) (
partition p_start values less than (to_date('01-Jan-2020'))
)
as
with g as (
select rownum id from dual
connect by level <= 1e4 -- > avoid wordpress format issue
)
select
rownum spell_id,
to_char(mod(rownum,75),'FM00') admission_method_code,
to_date('01-jan_2020') + rownum / 1000 admission_dttm,
to_char(mod(rownum,57),'FM00') administrative_category_code,
lpad(rownum,10) v1,
rpad('x',100) padding
from
g,g
where
rownum <= 365000
;
alter table ip_spells add constraint sp_pk primary key(spell_id);
create table ip_episodes
as
select
spell_id,
spell_id episode_id,
1+mod(rownum,6) episode_order,
to_char(mod(rownum,125)+ 500,'FM999') main_specialty_code,
v1,
padding
from
ip_spells
order by
dbms_random.value
;
alter table ip_episodes add constraint ep_pk primary key(episode_id);
alter table ip_episodes add constraint ep_fk_sp foreign key (spell_id) references ip_spells;
create index idx02_ip_episodes on ip_episodes(spell_id);
create table ip_diagnosis
as
select
episode_id,
chr(mod(rownum,25) + 65) ||
to_char(dbms_random.value(30,512),'FM999') diagnosis_code,
mod(rownum,5) diagnosis_sequence,
lpad(rownum,10) v1,
rpad('x',100) padding
from
(select rownum id from dual connect by level <= 5),
ip_episodes
;
alter table ip_diagnosis add constraint di_fk_ep foreign key(episode_id) references ip_episodes;
create index idx01_ip_diagnosis on ip_diagnosis(episode_id);
最初的ip_spells表是范围分区的,基于日期的 date列作为分区列。我还决定从一年开始以每天1000行的速度生成一年的数据,间隔分区表按月进行分区,以获得少量的分区。
最初的计划是创建ip_episodes表是一个与ip_spells表数据结构类似的表,所以我使用原始表ip_spells的副本生成ip_episodes表,然后给它适当的主外键约束和索引。
最后,ip_diagnosis表看起来像ip_episodes表中的数据平均对应5行数据,因此我将ip_episodes连接到常用的“ connect by”生成的5行集合中,从而生成了它。
我只有很少的数据集,并且大部分索引都是按表中顺序,基于很好的聚簇因子生成的。所以如果我的数据和统计信息没有直接产生原始查询的执行计划,我也不会感到惊讶。
这是原始查询(仅作为提醒),也是我没有HINT的第一个计划(运行12.1.0.2):
select
*
from
ip_spells a
where
not exists (
select
1
from
ip_episodes e
inner join
ip_diagnosis d
on
d.episode_id = e.episode_id
where
a.spell_id = e.spell_id
and (
substr(d.diagnosis_code,1,1) = ‘C’
or substr(d.diagnosis_code,1,3) between ‘D37’ and ‘D48’
or substr(d.diagnosis_code,1,1) = ‘V’
or d.diagnosis_code = ‘Z511’
)
)
and exists (
select
1
from
ip_episodes e
left join
ip_diagnosis d
on
d.episode_id = e.episode_id
where
a.spell_id = e.spell_id
and (
e.episode_order = ‘1’
and e.main_specialty_code not in (‘501’,‘560’,‘610’)
and d.diagnosis_sequence = 1
and substr(d.diagnosis_code,1,1) <> ‘O’
)
)
and substr(a.admission_method_code,1,1) = ‘2’ – 1% selectivity on substr()
and a.admission_dttm >= to_date(‘01-jan-2011’, ‘dd-mon-yyyy’)
and a.administrative_category_code = ‘01’ – 1 / 57 by definition;
Plan hash value: 1492475845
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 131 | 1299 (5)| 00:00:06 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ALL | | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
|* 3 | TABLE ACCESS FULL | IP_SPELLS | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
| 4 | NESTED LOOPS SEMI | | 1 | 20 | 11 (0)| 00:00:01 | | |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES | 1 | 10 | 4 (0)| 00:00:01 | | |
|* 6 | INDEX RANGE SCAN | IDX02_IP_EPISODES | 1 | | 3 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS | 40931 | 399K| 7 (0)| 00:00:01 | | |
|* 8 | INDEX RANGE SCAN | IDX01_IP_DIAGNOSIS | 5 | | 2 (0)| 00:00:01 | | |
| 9 | NESTED LOOPS SEMI | | 1 | 30 | 11 (0)| 00:00:01 | | |
|* 10 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES | 1 | 17 | 4 (0)| 00:00:01 | | |
|* 11 | INDEX RANGE SCAN | IDX02_IP_EPISODES | 1 | | 3 (0)| 00:00:01 | | |
|* 12 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS | 18250 | 231K| 7 (0)| 00:00:01 | | |
|* 13 | INDEX RANGE SCAN | IDX01_IP_DIAGNOSIS | 5 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------
如您所见,两个子查询都作为FILTER 子查询运行。如果要包含查询块信息,您会看到操作4到8是(已转换的)“not exists”子查询,操作9到13是已转换的“exists”子查询,而操作1到3代表了主查询。
我需要看到未嵌套的“not exists”子查询并转换为HASH反连接,因此我的第一次HINT尝试是向该子查询添加/+unnest/提示,生成以下计划:
Plan hash value: 147447036
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 133 | 1652 (4)| 00:00:07 | | |
|* 1 | FILTER | | | | | | | |
| 2 | NESTED LOOPS ANTI | | 1 | 133 | 1641 (4)| 00:00:07 | | |
| 3 | PARTITION RANGE ALL | | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
|* 4 | TABLE ACCESS FULL | IP_SPELLS | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
| 5 | VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 11 (0)| 00:00:01 | | |
| 6 | NESTED LOOPS SEMI | | 1 | 20 | 11 (0)| 00:00:01 | | |
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES | 1 | 10 | 4 (0)| 00:00:01 | | |
|* 8 | INDEX RANGE SCAN | IDX02_IP_EPISODES | 1 | | 3 (0)| 00:00:01 | | |
|* 9 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS | 40931 | 399K| 7 (0)| 00:00:01 | | |
|* 10 | INDEX RANGE SCAN | IDX01_IP_DIAGNOSIS | 5 | | 2 (0)| 00:00:01 | | |
| 11 | NESTED LOOPS SEMI | | 1 | 30 | 11 (0)| 00:00:01 | | |
|* 12 | TABLE ACCESS BY INDEX ROWID BATCHED | IP_EPISODES | 1 | 17 | 4 (0)| 00:00:01 | | |
|* 13 | INDEX RANGE SCAN | IDX02_IP_EPISODES | 1 | | 3 (0)| 00:00:01 | | |
|* 14 | TABLE ACCESS BY INDEX ROWID BATCHED | IP_DIAGNOSIS | 18250 | 231K| 7 (0)| 00:00:01 | | |
|* 15 | INDEX RANGE SCAN | IDX01_IP_DIAGNOSIS | 5 | | 2 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------
您可以在操作5中看到名称为vw_sq_1的未嵌套子查询;但是我们得到的是步骤2中的nested loop anti,所以我们也能看到在步骤5中的”pushed predicate”部分。而实际上我们想要的是一个hash join anti。
如果我在本例中发出查询块信息,您仍然会看到3个“final”查询块。操作11到15是“exists”子查询;操作5到10是优化器展开对“not exists”子查询产生的不可合并视图的查询块;操作1到4表示整个父查询块。
这个例子突出了一个细节,这个细节在前面关于查询及其计划的讨论中很容易遗漏。计划中的某些操作看起来好像可以与两个查询块名称相关联—它们是其最上面一行的查询块,以及使用它们的查询块。
以该计划为例,操作5显然是操作5到10的查询块的起点,而操作11显然是操作11到15的查询块的起点。我们折叠查询块以便了解查询操作的总体结构,我们得到以下计划-在主查询块中,我们将“ Rowsource 2”和“ Rowsource 3”视为简单数据源,并且很高兴看到操作5和11具有与操作1到4相同的查询块名称。
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 133 | 1652 (4)| 00:00:07 | | |
|* 1 | FILTER | | | | | | | |
| 2 | NESTED LOOPS ANTI | | 1 | 133 | 1641 (4)| 00:00:07 | | |
| 3 | PARTITION RANGE ALL | | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
|* 4 | TABLE ACCESS FULL | IP_SPELLS | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
| 5 | Rowsource 2 : VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 11 (0)| 00:00:01 | | |
| 11 | Rowsource 3 : NESTED LOOPS SEMI | | 1 | 30 | 11 (0)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------
为了达到我们想要的效果,我们还有一点工作要做:/+unnest/HINT已经让我们实现了我们想要的计划——所以(从push predicate选项得到HINT)为了从嵌套循环反连接到HASH反连接,我决定尝试将“not exists”子查询中的提示更改为/+unnest no_push_pred/–并且这是一个计划,其结果是:
Plan hash value: 2721384176
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 144 | 6633 (11)| 00:00:26 | | |
|* 1 | FILTER | | | | | | | |
|* 2 | HASH JOIN ANTI | | 1 | 144 | 6622 (11)| 00:00:26 | | |
| 3 | PARTITION RANGE ALL | | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
|* 4 | TABLE ACCESS FULL | IP_SPELLS | 64 | 8384 | 935 (6)| 00:00:04 | 1 |1048575|
| 5 | VIEW | VW_SQ_1 | 40931 | 519K| 5685 (12)| 00:00:23 | | |
|* 6 | HASH JOIN | | 40931 | 799K| 5685 (12)| 00:00:23 | | |
|* 7 | TABLE ACCESS FULL | IP_DIAGNOSIS | 40931 | 399K| 4761 (13)| 00:00:19 | | |
| 8 | TABLE ACCESS FULL | IP_EPISODES | 365K| 3564K| 906 (5)| 00:00:04 | | |
| 9 | NESTED LOOPS SEMI | | 1 | 30 | 11 (0)| 00:00:01 | | |
|* 10 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES | 1 | 17 | 4 (0)| 00:00:01 | | |
|* 11 | INDEX RANGE SCAN | IDX02_IP_EPISODES | 1 | | 3 (0)| 00:00:01 | | |
|* 12 | TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS | 18250 | 231K| 7 (0)| 00:00:01 | | |
|* 13 | INDEX RANGE SCAN | IDX01_IP_DIAGNOSIS | 5 | | 2 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------
看起来它是正确的,它有正确的结构,而且(非常重要的是——虽然不是100%的保证)它有与最初发布的计划相同的计划哈希值。因此,在这一点上,我觉得它展示的最终查询块名称将反映出原始计划中使用的那些块名。
当然,在生产系统中,您不能只是在查询中添加一些HINT,目标是希望这些HINT足以稳定计划。以下是当我将两个HINT添加到查询时Outline Data中出现的完整HINT信息:
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
PARTIAL_JOIN(@"SEL$2B0A33EA" "D"@"SEL$2")
USE_NL(@"SEL$2B0A33EA" "D"@"SEL$2")
LEADING(@"SEL$2B0A33EA" "E"@"SEL$3" "D"@"SEL$2")
BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "D"@"SEL$2")
INDEX_RS_ASC(@"SEL$2B0A33EA" "D"@"SEL$2" ("IP_DIAGNOSIS"."EPISODE_ID"))
BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "E"@"SEL$3")
INDEX_RS_ASC(@"SEL$2B0A33EA" "E"@"SEL$3" ("IP_EPISODES"."SPELL_ID"))
USE_HASH(@"SEL$8D33959D" "E"@"SEL$1")
LEADING(@"SEL$8D33959D" "D"@"SEL$1" "E"@"SEL$1")
FULL(@"SEL$8D33959D" "E"@"SEL$1")
FULL(@"SEL$8D33959D" "D"@"SEL$1")
PQ_FILTER(@"SEL$2B969259" SERIAL)
USE_HASH(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
LEADING(@"SEL$2B969259" "A"@"SEL$4" "VW_SQ_1"@"SEL$F49409E0")
NO_ACCESS(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
FULL(@"SEL$2B969259" "A"@"SEL$4")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$4")
OUTLINE(@"SEL$1")
OUTLINE(@"SEL$5")
MERGE(@"SEL$2")
OUTLINE(@"SEL$64EAE176")
OUTLINE(@"SEL$6")
OUTLINE(@"SEL$F49409E0")
MERGE(@"SEL$1")
OUTLINE(@"SEL$8C3A16E3")
MERGE(@"SEL$64EAE176")
OUTLINE(@"SEL$4B169FC8")
UNNEST(@"SEL$8C3A16E3")
OUTLINE_LEAF(@"SEL$2B969259")
OUTLINE_LEAF(@"SEL$8D33959D")
OUTER_JOIN_TO_INNER(@"SEL$4B169FC8" "D"@"SEL$2")
OUTLINE_LEAF(@"SEL$2B0A33EA")
ALL_ROWS
DB_VERSION('12.1.0.2')
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
当我在查询文本中输入两个HINT并在19.3下重新运行测试时,FILTER 操作消失了,“exists”子查询也展开(变为了视图vw_sq_2),变成了嵌套循环半联接。有了40条完整的HINT后,从12.1.0.2开始的计划又出现了。
对于手动HINT测试,我确实有一个替代策略。具有嵌套循环反联接的计划显示了以下查询块信息:
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$2B969259
4 - SEL$2B969259 / A@SEL$4
5 - SEL$D276C01C / VW_SQ_1@SEL$F49409E0
6 - SEL$D276C01C
7 - SEL$D276C01C / E@SEL$1
8 - SEL$D276C01C / E@SEL$1
9 - SEL$D276C01C / D@SEL$1
10 - SEL$D276C01C / D@SEL$1
11 - SEL$2B0A33EA
12 - SEL$2B0A33EA / E@SEL$3
13 - SEL$2B0A33EA / E@SEL$3
这促使我告诉ORACLE在查询块SEL$2B969259中进行A@SEL4 和 VW_SQ_1@SELF49409E0做一个HASH连接(按该顺序),在查询的开头添加一组简单的HINT,同时在“not exists”子查询中保留/+unnest/提示。
select
/*+
leading (@sel$2b969259 a@sel$4 vw_sq_1@sel$f49409e0)
use_hash(@sel$2b969259 vw_sq_1@sel$f49409e0)
no_swap_join_inputs(@sel$2b969259 vw_sq_1@sel$f49409e0)
*/
*
from
同样,这产生了所需的执行路径。
通过查看查询文本来重现执行计划并不总是那么容易–有时实际计划取决于了解诸如唯一性约束,外键约束和非空约束。但是,如果您可以动手使用SQL创建涉及的对象,通常可以通过HINT来重新创建计划,这将使您发现如何更改HINT以修改执行计划。。




