暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

oracle查询转换中的子查询展开

原创 唐祖亮 2020-05-24
1767

oracle查询转换中的子查询展开

下面是借用崔华的《基于oracle的SQL优化》一书中对查询转换和子查询展开的描述。

Oracle里的査询转换(Query Transfbrmation),又称为査询改写(Query Rewrite),它是Oracle在解析目标 SQL的过程中的重要一步,其含义是指Oracle在解析目标SQL时可能会对其做等价改写,目的是为了能更高效地执行目标SQL,即Oracle可能会将目标SQL改写成语义上完全等价但执行效率却更高的形式。
子査询展开(Subquery Unnesting)是优化器处理带子査询的目标SQL的一种优化手段,它是指优化器不再将目标SQL中的子査询当作一个独立的处理单元来单独执行,而是将该子査询转换为它自身和外部査询之间等价的表连接。这种等价表连接转换要么是将子査询拆开(即将该子査询中的表、视图从子査询中拿出来, 然后和外部査询中的表、视图做表连接),要么是不拆开但是会把该子査询转换为一个内嵌视图(Inline View), 然后再和外部査询中的表、视图做表连接。Oracle会确保子査询展开所对应的等价表连接转换的正确性,即转换后的SQL和原SQL在语义上一定是等价的。当然,不是所有的子査询都能做子査询展开,有些子査询是不能做这种等价表连接转换的,这种情况 下Oracle就不会对其做子査询展开,也就是说此时Oracle还是会将该子査询当作一个独立的处理单元来单独 执行。另外,在Oracle 10g及其以后的版本中,对于那种不拆开子査询但是会把该子査询转换成一个内嵌视 圏的子査询展开(即子査询展开的第二种情形),只有当经过子査询展开后的等价改写SQL的成本值小于原 SQL的成本值时,Oracle才会对原SQL执行子査询展开。

Oracle数据库里子査询前的where条件如果是如下这些条件之一,那么这种类型的目标SQL在满足了一 定的条件后就可以做子査询展开:
• SINGLE-ROW (即=、<、>、<=、>=和<>)
• EXISTS
• NOT EXISTS
• IN
• NOT IN
• ANY
• ALL
如果一个子査询前的where条件是SINGLE-ROW条件,则意味着该子査询的返回结果至多只会返回一条记录;
如果该子査询前的where条件是除SINGLE-ROW条件之外的上述其他类型的条件,则该子査询的返回 结果就可以包含多条记录。
比如下面这个SQL在子查询部分返回结果超过一行。这时候执行SQL会报错,即单行子查询返回多行。

select *
  from emp
 where deptno < (select deptno from dept where deptno < 30);

ERROR at line 1: ORA-01427: 
single-row subquery returns more than one row 

下面看一个子查询展开的实例。
以下测试环境数据库版本为11.2.0.1

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE	11.2.0.1.0	Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

创建测试表

create table t1 as select * from dba_objects;
create table t2 as select * from dba_objects where object_id<30000;
create table t3 as select * from dba_objects where object_id<20000;

在T1表的object_id列上创建索引

create index ind_t1_oobject_id on t1(object_id);

编写一个如下的子查询SQL:

select *
  from t1
 where object_id in (select object_id from t2 where data_object_id < 30);

该SQL的子查询部分为(select object_id from t2 where data_object_id<30),而且WHERE条件使用的是IN,
实际上该SQL和下面使用ANY和EXISTS的两个SQL是等价的。

select *
  from t1
 where object_id = any (select object_id from t2 where data_object_id < 30);

select *
  from t1
 where exists (select 1
          from t2
         where data_object_id < 30
           and t1.object_id = t2.object_id);

在使用IN的SQL中加入no_unnest(不使用子查询展开)的提示,然后执行SQL查看其执行计划。

select *
  from t2
 where object_id in (select /*+ no_unnest */
                      object_id
                       from t1
                      where data_object_id < 30);

Elapsed: 00:00:00.10

Execution Plan
----------------------------------------------------------
Plan hash value: 865261136

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    94 | 37310   (1)| 00:07:28 |
|*  1 |  FILTER                      |                   |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | T2                | 29514 |  2709K|   118   (1)| 00:00:02 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| T1                |     1 |     7 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | IND_T1_OOBJECT_ID |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T1" "T1" WHERE "OBJECT_ID"=:B1 AND
              "DATA_OBJECT_ID"<30))
   3 - filter("DATA_OBJECT_ID"<30)
   4 - access("OBJECT_ID"=:B1)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      59899  consistent gets
          0  physical reads
          0  redo size
       3376  bytes sent via SQL*Net to client
        545  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         43  rows processed

从上面的执行计划可以看到在ID1步骤的过滤是最后执行的,而该步骤的谓词是
filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM “T2” “T2” WHERE
:B1<30 AND “OBJECT_ID”=:B2 AND “DATA_OBJECT_ID”<30))
这正是SQL中的子查询部分(select object_id from t2 where data_object_id<30),
子查询在这里只起到了过滤全表扫描表t1后得到29514条数据的目的。
这说明:不能做子査询展开的子査询通常在目标SQL的执行计划的最后一步才会被执行,并且会走FILTER
类型的执行计划。

全表扫描表T1后得到的结果集的Cardinality为29514,并且object_id列的值是具有唯一性的,
这意味着优化器要以驱动査询条件““OBJECT_ID”=:B1 AND “OBJECT_ID”<30”去执行29514次上述子査询,这显然是不
合理的,也直接导致了上述SQL的执行时间为0.10 s,耗费的逻辑读为59899,成本为37310。

接下来把NO_UNNEST的hint去掉再执行查看执行计划。

select *
  from t2
 where object_id in (select object_id from t1 where data_object_id < 30);

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 3761397674

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     4 |   404 |   408   (1)| 00:00:05 |
|*  1 |  HASH JOIN RIGHT SEMI|      |     4 |   404 |   408   (1)| 00:00:05 |
|*  2 |   TABLE ACCESS FULL  | T1   |     3 |    21 |   290   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL  | T2   | 29514 |  2709K|   117   (0)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"="OBJECT_ID")
   2 - filter("DATA_OBJECT_ID"<30)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1457  consistent gets
          0  physical reads
          0  redo size
       3376  bytes sent via SQL*Net to client
        545  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         43  rows processed

可以看到在去掉不做子查询展开的HITN之后,执行计划完全和之前不一样了。这里把子查询展开了把T1表从子查询中拿了出来,
并且和外部表T2做了哈希半连接。ORACLE这里已经对SQL做了子查询展开,这样做的好处就是现在只需要对T1、T2表各做一次全表扫描就行了,
不必再根据全表扫描T2表返回的数据去多次执行子查询,导致对T1表的29514次扫描了。
子查询展开的好处也能从上面的执行计划中看出来,执行时间从开始的0.1秒下降到了0.02秒,逻辑读从59899下降到了1457,
成本从37310降低到了408。

当做子查询展开时,Oracle经常会把那些外部where条件为“EXISTS”、“IN”或“=ANY”的子査询
转换为对应的半连接(Semi Join)。为了方便,这里我们用“tl.x semi= t2.y”来表示表T1和表T2做半连接,
且T1是驱动表,T2是被驱动表,半连接条件为tl.x=t2.y。
上述“tl.x semi= t2.y”的含义是:
只要在表T2中找到一条记录满足tl.x=t2.y,则马上停止搜索表T2,并且直接返回表T1中满足条件tl.x=t2.y
的记录。也就是说,即使表T2中满足半连接条件tl.x=t2.y的记录有多条,Oracle也只会返回表T2中第一条
满足tl.x=t2.y所对应的表T1中的记录。

对于上面的子查询SQL:

select *
  from t2
 where object_id in (select object_id from t1 where data_object_id < 30);

当Oracle对其做子查询展开之后是如下形式:

select *
  from t1, t2
 where t1.object_id semi = t2.object_id
   and t1.data_object_id < 30;

但是针对那些外部where条件为EXISTS. IN或ANY的子査询,Oracle并不一定只将其
转换为对应的半连接,在满足一定的条件时,转换后的表连接类型也可以是内连接。
在上面的子查询SQL中,由于表T2中的OBJECT_ID列上的值不存在重复值(列为主键或唯一约束等情况),
这时SQL在做子查询展开时也可以改写成如下的内连接形式:

select *
  from t1, t2
 where t1.object_id = t2.object_id
   and t1.data_object_id < 30;

而与半连接对应的就是反连接(anti join),下面看一个反连接的SQL:

select *
  from t2
 where object_id not in
       (select object_id from t1 where data_object_id < 30);

如果表T2中的OBJECT_ID列上不存在空值,则和下面使用<>ALL和NOT EXISTS的两个SQL语义是等价的。

select *
  from t2
 where object_id <> all
 (select object_id from t1 where data_object_id < 30);

select *
  from t2
 where not exists (select object_id
          from t1
         where data_object_id < 30
           and t1.object_id = t2.object_id);

执行上面的子查询SQL查看其执行计划:

Elapsed: 00:00:00.62

Execution Plan
----------------------------------------------------------
Plan hash value: 35395643

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      | 29465 |  2906K|   408   (1)| 00:00:05 |
|*  1 |  HASH JOIN RIGHT ANTI NA|      | 29465 |  2906K|   408   (1)| 00:00:05 |
|*  2 |   TABLE ACCESS FULL     | T1   |    48 |   336 |   290   (1)| 00:00:04 |
|   3 |   TABLE ACCESS FULL     | T2   | 29514 |  2709K|   117   (0)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"="OBJECT_ID")
   2 - filter("DATA_OBJECT_ID"<30)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3387  consistent gets
          0  physical reads
          0  redo size
    1494314  bytes sent via SQL*Net to client
      22127  bytes received via SQL*Net from client
       1966  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      29471  rows processed

从上述显示内容可以看到,显然Oracle这里已经对上述SQL做了子査询展开,即Oracle这里已经把子査
询select object_id from t1 where data_object_id<30拆开了,将子査询中的表 T1 拿了出来,并
且和外部査询中的表T2做了哈希右反连接。
当做子査询展开时,Oracle经常会把那些外部where条件为NOT EXISTS、NOT IN或o ALL的子査询转
换成对应的反连接。为了方便,这里我们用“tl.xanri=t2.yw来表示表T1和表T2做反连接,且T1是驱动表,
T2是被驱动表,反连接条件为tl.x=t2.y。
上述“tl.x如=t2.y”的含义是:
只要表T2中有满足条件tl.x=t2.y的记录存在,则表T1中满足条件tl.x=t2.y的记录就会被丢弃,最后返回的记
录就是表T1中那些不满足条件tl.x=t2.y的记录。

那么对于上面NO IN子查询SQL:

select *
  from t2
 where object_id not in
       (select object_id from t1 where data_object_id < 30);

ORACLE做子查询展开之后改写为了如下形式:

select *
  from t1, t2
 where t1.object_id anit = t2.object_id
   and t1.data_object_id < 30;

接下来看子查询第二种形式展开的案例:

select *
  from t2
 where object_id in (select t1.object_id
                       from t1, t3
                      where t1.object_id = t3.object_id
                        and t1.data_object_id < 30);

Elapsed: 00:00:00.09

Execution Plan
----------------------------------------------------------
Plan hash value: 2258248648

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    48 |  5136 |   486   (1)| 00:00:06 |
|*  1 |  HASH JOIN RIGHT SEMI|          |    48 |  5136 |   486   (1)| 00:00:06 |
|   2 |   VIEW               | VW_NSO_1 |    48 |   624 |   368   (1)| 00:00:05 |
|*  3 |    HASH JOIN         |          |    48 |   960 |   368   (1)| 00:00:05 |
|*  4 |     TABLE ACCESS FULL| T1       |    48 |   336 |   290   (1)| 00:00:04 |
|   5 |     TABLE ACCESS FULL| T3       | 19614 |   249K|    77   (0)| 00:00:01 |
|   6 |   TABLE ACCESS FULL  | T2       | 29514 |  2709K|   117   (0)| 00:00:02 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"="OBJECT_ID")
   3 - access("T1"."OBJECT_ID"="T3"."OBJECT_ID")
   4 - filter("T1"."DATA_OBJECT_ID"<30)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         54  recursive calls
          0  db block gets
       2378  consistent gets
          0  physical reads
          0  redo size
       3376  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         43  rows processed

可以从上面的执行计划中看到,oracle对该SQL确实做了子查询展开,但是首先是把T1表和T3表连接的结果集作为内嵌视图,
然后再和外面的T2表做了哈希右半连接。这里不能把子查询直接拆开是为了保留T1表和T3表的连接顺序,直接展开不能保证展开后的SQL语义上
与原SQL完全等价。这就是子查询展开的第二种形式:不拆开子査询,但是会把它转换为一个内嵌视图,然后再和外部査询中的表、视图做表连接。

子查询能否做子查询展开取决于以下两个条件:
•子査询展开所对应的等价改写SQL和原SQL在语义上一定要是完全等价的。如果改写后的SQL和原
SQL并不一定能保持语义上的完全等价,这种类型的子査询就不能做子査询展开。
•对于不拆开子査询但是会把它转换为一个内嵌视图的子査询展开,只有当经过子査询展开后的等价改
写SQL的成本值小于原SQL的成本值时,Oracle才会对目标SQL执行子査询展开。

在满足一种子查询展开的时候,即使展开后的成本高于不展开的成本,oracle还是会使用子查询展开。
下面看一个实例,这是一个简单的子查询SQL,现执行该SQL查看其执行计划;

select *
  from emp
 where deptno in (select deptno from dept where loc = 'CHICAGO');

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 844388907

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     5 |   540 |     6  (17)| 00:00:01 |
|   1 |  MERGE JOIN                  |         |     5 |   540 |     6  (17)| 00:00:01 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    21 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT JOIN                  |         |    14 |  1218 |     4  (25)| 00:00:01 |
|   5 |    TABLE ACCESS FULL         | EMP     |    14 |  1218 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LOC"='CHICAGO')
   4 - access("DEPTNO"="DEPTNO")
       filter("DEPTNO"="DEPTNO")
   ......省略部分内容

从执行计划上来看确实是做了子查询展开,并且用的是MERGE的方式对两张表进行的内连接,
在这里展开为内连接是因为dept表的deptno列上存在主键,也就是说该列不存在重复值和空值,
因此不需要用半连接的方式去重,而是直接使用内连接的方式。从上面的执行计划看到该SQL做子查询
展开后的成本是,下面使用提示不让其做子查询展开看看成本值为多少。

select *
  from emp
 where deptno in (select /*+ no_unnest */
                   deptno
                    from dept
                   where loc = 'CHICAGO');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2809975276

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |    14 |  1218 |     4   (0)| 00:00:01 |
|*  1 |  FILTER                      |         |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | EMP     |    14 |  1218 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    21 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "DEPT" "DEPT" WHERE
              "DEPTNO"=:B1 AND "LOC"='CHICAGO'))
   3 - filter("LOC"='CHICAGO')
   4 - access("DEPTNO"=:B1)
   ......省略部分内容

可以看到未做子查询展开的成本为4,是低于做子查询展开的成本的,但是oracle还是在没有提示的情况下选择了做
子查询展开,这就是第一种子查询展开的一个缺点:对于子査询展开的第一种情形,即使是在Oracle 10g及其以后的版本中,
Oracle也不会考虑子査询展开的成本,并始终会对目标SQL做子査询展开。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论