oracle查询转换中的子查询展开
下面是借用崔华的《基于oracle的SQL优化》一书中对查询转换和子查询展开的描述。
Oracle里的査询转换(Query Transfbrmation),又称为査询改写(Query Rewrite),它是Oracle在解析目标 SQL的过程中的重要一步,其含义是指Oracle在解析目标SQL时可能会对其做等价改写,目的是为了能更高效地执行目标SQL,即Oracle可能会将目标SQL改写成语义上完全等价但执行效率却更高的形式。
子査询展开(Subquery Unnesting)是优化器处理带子査询的目标SQL的一种优化手段,它是指优化器不再将目标SQL中的子査询当作一个独立的处理单元来单独执行,而是将该子査询转换为它自身和外部査询之间等价的表连接。这种等价表连接转换要么是将子査询拆开(即将该子査询中的表、视图从子査询中拿出来, 然后和外部査询中的表、视图做表连接),要么是不拆开但是会把该子査询转换为一个内嵌视图(Inline View), 然后再和外部査询中的表、视图做表连接。Oracle会确保子査询展开所对应的等价表连接转换的正确性,即转换后的SQL和原SQL在语义上一定是等价的。当然,不是所有的子査询都能做子査询展开,有些子査询是不能做这种等价表连接转换的,这种情况 下Oracle就不会对其做子査询展开,也就是说此时Oracle还是会将该子査询当作一个独立的处理单元来单独 执行。另外,在Oracle 10g及其以后的版本中,对于那种不拆开子査询但是会把该子査询转换成一个内嵌视 圏的子査询展开(即子査询展开的第二种情形),只有当经过子査询展开后的等价改写SQL的成本值小于原 SQL的成本值时,Oracle才会对原SQL执行子査询展开。
Oracle数据库里子査询前的where条件如果是如下这些条件之一,那么这种类型的目标SQL在满足了一 定的条件后就可以做子査询展开:
• SINGLE-ROW (即=、<、>、<=、>=和<>)
• EXISTS
• NOT EXISTS
• IN
• NOT IN
• ANY
• ALL
如果一个子査询前的where条件是SINGLE-ROW条件,则意味着该子査询的返回结果至多只会返回一条记录;
如果该子査询前的where条件是除SINGLE-ROW条件之外的上述其他类型的条件,则该子査询的返回 结果就可以包含多条记录。
比如下面这个SQL在子查询部分返回结果超过一行。这时候执行SQL会报错,即单行子查询返回多行。
select *
from emp
where deptno < (select deptno from dept where deptno < 30);
ERROR at line 1: ORA-01427:
single-row subquery returns more than one row
下面看一个子查询展开的实例。
以下测试环境数据库版本为11.2.0.1
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
创建测试表
create table t1 as select * from dba_objects;
create table t2 as select * from dba_objects where object_id<30000;
create table t3 as select * from dba_objects where object_id<20000;
在T1表的object_id列上创建索引
create index ind_t1_oobject_id on t1(object_id);
编写一个如下的子查询SQL:
select *
from t1
where object_id in (select object_id from t2 where data_object_id < 30);
该SQL的子查询部分为(select object_id from t2 where data_object_id<30),而且WHERE条件使用的是IN,
实际上该SQL和下面使用ANY和EXISTS的两个SQL是等价的。
select *
from t1
where object_id = any (select object_id from t2 where data_object_id < 30);
select *
from t1
where exists (select 1
from t2
where data_object_id < 30
and t1.object_id = t2.object_id);
在使用IN的SQL中加入no_unnest(不使用子查询展开)的提示,然后执行SQL查看其执行计划。
select *
from t2
where object_id in (select /*+ no_unnest */
object_id
from t1
where data_object_id < 30);
Elapsed: 00:00:00.10
Execution Plan
----------------------------------------------------------
Plan hash value: 865261136
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 37310 (1)| 00:07:28 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | T2 | 29514 | 2709K| 118 (1)| 00:00:02 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 7 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T1_OOBJECT_ID | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T1" "T1" WHERE "OBJECT_ID"=:B1 AND
"DATA_OBJECT_ID"<30))
3 - filter("DATA_OBJECT_ID"<30)
4 - access("OBJECT_ID"=:B1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
59899 consistent gets
0 physical reads
0 redo size
3376 bytes sent via SQL*Net to client
545 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
43 rows processed
从上面的执行计划可以看到在ID1步骤的过滤是最后执行的,而该步骤的谓词是
filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM “T2” “T2” WHERE
:B1<30 AND “OBJECT_ID”=:B2 AND “DATA_OBJECT_ID”<30))
这正是SQL中的子查询部分(select object_id from t2 where data_object_id<30),
子查询在这里只起到了过滤全表扫描表t1后得到29514条数据的目的。
这说明:不能做子査询展开的子査询通常在目标SQL的执行计划的最后一步才会被执行,并且会走FILTER
类型的执行计划。
全表扫描表T1后得到的结果集的Cardinality为29514,并且object_id列的值是具有唯一性的,
这意味着优化器要以驱动査询条件““OBJECT_ID”=:B1 AND “OBJECT_ID”<30”去执行29514次上述子査询,这显然是不
合理的,也直接导致了上述SQL的执行时间为0.10 s,耗费的逻辑读为59899,成本为37310。
接下来把NO_UNNEST的hint去掉再执行查看执行计划。
select *
from t2
where object_id in (select object_id from t1 where data_object_id < 30);
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
Plan hash value: 3761397674
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 404 | 408 (1)| 00:00:05 |
|* 1 | HASH JOIN RIGHT SEMI| | 4 | 404 | 408 (1)| 00:00:05 |
|* 2 | TABLE ACCESS FULL | T1 | 3 | 21 | 290 (1)| 00:00:04 |
| 3 | TABLE ACCESS FULL | T2 | 29514 | 2709K| 117 (0)| 00:00:02 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_ID"="OBJECT_ID")
2 - filter("DATA_OBJECT_ID"<30)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1457 consistent gets
0 physical reads
0 redo size
3376 bytes sent via SQL*Net to client
545 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
43 rows processed
可以看到在去掉不做子查询展开的HITN之后,执行计划完全和之前不一样了。这里把子查询展开了把T1表从子查询中拿了出来,
并且和外部表T2做了哈希半连接。ORACLE这里已经对SQL做了子查询展开,这样做的好处就是现在只需要对T1、T2表各做一次全表扫描就行了,
不必再根据全表扫描T2表返回的数据去多次执行子查询,导致对T1表的29514次扫描了。
子查询展开的好处也能从上面的执行计划中看出来,执行时间从开始的0.1秒下降到了0.02秒,逻辑读从59899下降到了1457,
成本从37310降低到了408。
当做子查询展开时,Oracle经常会把那些外部where条件为“EXISTS”、“IN”或“=ANY”的子査询
转换为对应的半连接(Semi Join)。为了方便,这里我们用“tl.x semi= t2.y”来表示表T1和表T2做半连接,
且T1是驱动表,T2是被驱动表,半连接条件为tl.x=t2.y。
上述“tl.x semi= t2.y”的含义是:
只要在表T2中找到一条记录满足tl.x=t2.y,则马上停止搜索表T2,并且直接返回表T1中满足条件tl.x=t2.y
的记录。也就是说,即使表T2中满足半连接条件tl.x=t2.y的记录有多条,Oracle也只会返回表T2中第一条
满足tl.x=t2.y所对应的表T1中的记录。
对于上面的子查询SQL:
select *
from t2
where object_id in (select object_id from t1 where data_object_id < 30);
当Oracle对其做子查询展开之后是如下形式:
select *
from t1, t2
where t1.object_id semi = t2.object_id
and t1.data_object_id < 30;
但是针对那些外部where条件为EXISTS. IN或ANY的子査询,Oracle并不一定只将其
转换为对应的半连接,在满足一定的条件时,转换后的表连接类型也可以是内连接。
在上面的子查询SQL中,由于表T2中的OBJECT_ID列上的值不存在重复值(列为主键或唯一约束等情况),
这时SQL在做子查询展开时也可以改写成如下的内连接形式:
select *
from t1, t2
where t1.object_id = t2.object_id
and t1.data_object_id < 30;
而与半连接对应的就是反连接(anti join),下面看一个反连接的SQL:
select *
from t2
where object_id not in
(select object_id from t1 where data_object_id < 30);
如果表T2中的OBJECT_ID列上不存在空值,则和下面使用<>ALL和NOT EXISTS的两个SQL语义是等价的。
select *
from t2
where object_id <> all
(select object_id from t1 where data_object_id < 30);
select *
from t2
where not exists (select object_id
from t1
where data_object_id < 30
and t1.object_id = t2.object_id);
执行上面的子查询SQL查看其执行计划:
Elapsed: 00:00:00.62
Execution Plan
----------------------------------------------------------
Plan hash value: 35395643
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29465 | 2906K| 408 (1)| 00:00:05 |
|* 1 | HASH JOIN RIGHT ANTI NA| | 29465 | 2906K| 408 (1)| 00:00:05 |
|* 2 | TABLE ACCESS FULL | T1 | 48 | 336 | 290 (1)| 00:00:04 |
| 3 | TABLE ACCESS FULL | T2 | 29514 | 2709K| 117 (0)| 00:00:02 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_ID"="OBJECT_ID")
2 - filter("DATA_OBJECT_ID"<30)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3387 consistent gets
0 physical reads
0 redo size
1494314 bytes sent via SQL*Net to client
22127 bytes received via SQL*Net from client
1966 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
29471 rows processed
从上述显示内容可以看到,显然Oracle这里已经对上述SQL做了子査询展开,即Oracle这里已经把子査
询select object_id from t1 where data_object_id<30拆开了,将子査询中的表 T1 拿了出来,并
且和外部査询中的表T2做了哈希右反连接。
当做子査询展开时,Oracle经常会把那些外部where条件为NOT EXISTS、NOT IN或o ALL的子査询转
换成对应的反连接。为了方便,这里我们用“tl.xanri=t2.yw来表示表T1和表T2做反连接,且T1是驱动表,
T2是被驱动表,反连接条件为tl.x=t2.y。
上述“tl.x如=t2.y”的含义是:
只要表T2中有满足条件tl.x=t2.y的记录存在,则表T1中满足条件tl.x=t2.y的记录就会被丢弃,最后返回的记
录就是表T1中那些不满足条件tl.x=t2.y的记录。
那么对于上面NO IN子查询SQL:
select *
from t2
where object_id not in
(select object_id from t1 where data_object_id < 30);
ORACLE做子查询展开之后改写为了如下形式:
select *
from t1, t2
where t1.object_id anit = t2.object_id
and t1.data_object_id < 30;
接下来看子查询第二种形式展开的案例:
select *
from t2
where object_id in (select t1.object_id
from t1, t3
where t1.object_id = t3.object_id
and t1.data_object_id < 30);
Elapsed: 00:00:00.09
Execution Plan
----------------------------------------------------------
Plan hash value: 2258248648
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 48 | 5136 | 486 (1)| 00:00:06 |
|* 1 | HASH JOIN RIGHT SEMI| | 48 | 5136 | 486 (1)| 00:00:06 |
| 2 | VIEW | VW_NSO_1 | 48 | 624 | 368 (1)| 00:00:05 |
|* 3 | HASH JOIN | | 48 | 960 | 368 (1)| 00:00:05 |
|* 4 | TABLE ACCESS FULL| T1 | 48 | 336 | 290 (1)| 00:00:04 |
| 5 | TABLE ACCESS FULL| T3 | 19614 | 249K| 77 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T2 | 29514 | 2709K| 117 (0)| 00:00:02 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_ID"="OBJECT_ID")
3 - access("T1"."OBJECT_ID"="T3"."OBJECT_ID")
4 - filter("T1"."DATA_OBJECT_ID"<30)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
54 recursive calls
0 db block gets
2378 consistent gets
0 physical reads
0 redo size
3376 bytes sent via SQL*Net to client
546 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
43 rows processed
可以从上面的执行计划中看到,oracle对该SQL确实做了子查询展开,但是首先是把T1表和T3表连接的结果集作为内嵌视图,
然后再和外面的T2表做了哈希右半连接。这里不能把子查询直接拆开是为了保留T1表和T3表的连接顺序,直接展开不能保证展开后的SQL语义上
与原SQL完全等价。这就是子查询展开的第二种形式:不拆开子査询,但是会把它转换为一个内嵌视图,然后再和外部査询中的表、视图做表连接。
子查询能否做子查询展开取决于以下两个条件:
•子査询展开所对应的等价改写SQL和原SQL在语义上一定要是完全等价的。如果改写后的SQL和原
SQL并不一定能保持语义上的完全等价,这种类型的子査询就不能做子査询展开。
•对于不拆开子査询但是会把它转换为一个内嵌视图的子査询展开,只有当经过子査询展开后的等价改
写SQL的成本值小于原SQL的成本值时,Oracle才会对目标SQL执行子査询展开。
在满足一种子查询展开的时候,即使展开后的成本高于不展开的成本,oracle还是会使用子查询展开。
下面看一个实例,这是一个简单的子查询SQL,现执行该SQL查看其执行计划;
select *
from emp
where deptno in (select deptno from dept where loc = 'CHICAGO');
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 844388907
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 540 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 5 | 540 | 6 (17)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 21 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 14 | 1218 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 14 | 1218 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("LOC"='CHICAGO')
4 - access("DEPTNO"="DEPTNO")
filter("DEPTNO"="DEPTNO")
......省略部分内容
从执行计划上来看确实是做了子查询展开,并且用的是MERGE的方式对两张表进行的内连接,
在这里展开为内连接是因为dept表的deptno列上存在主键,也就是说该列不存在重复值和空值,
因此不需要用半连接的方式去重,而是直接使用内连接的方式。从上面的执行计划看到该SQL做子查询
展开后的成本是,下面使用提示不让其做子查询展开看看成本值为多少。
select *
from emp
where deptno in (select /*+ no_unnest */
deptno
from dept
where loc = 'CHICAGO');
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2809975276
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 1218 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 1218 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 21 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "DEPT" "DEPT" WHERE
"DEPTNO"=:B1 AND "LOC"='CHICAGO'))
3 - filter("LOC"='CHICAGO')
4 - access("DEPTNO"=:B1)
......省略部分内容
可以看到未做子查询展开的成本为4,是低于做子查询展开的成本的,但是oracle还是在没有提示的情况下选择了做
子查询展开,这就是第一种子查询展开的一个缺点:对于子査询展开的第一种情形,即使是在Oracle 10g及其以后的版本中,
Oracle也不会考虑子査询展开的成本,并始终会对目标SQL做子査询展开。




