子查询相关改写
优化器对于子查询一般使用嵌套执行的方式,也就是父查询每生成一行数据后,都需要执行一次子查询。使用这种方式需要多次执行子查询,执行效率很低。对于子查询的优化方式,一般会将其改写为联接操作,可大大提高执行效率,主要优点如下:
可避免子查询多次执行。
优化器可根据统计信息选择更优的联接顺序和联接方法。
子查询的联接条件、过滤条件改写为父查询的条件后,优化器可以进行进一步优化,例如条件下压等。
子查询改写的方式主要包括视图合并、子查询展开和将 ANY/ALL 使用 MAX/MIN 改写等。
视图合并
视图合并是将代表一个视图的子查询合并到包含该视图的查询中,视图合并后,有助于优化器增加联接顺序的选择、访问路径的选择以及进一步做其他改写操作,从而选择更优的执行计划。
OceanBase 数据库支持对 SPJ 视图进行合并。如下示例为将查询 Q1 改写为 Q2:
obclient>CREATE TABLE t1 (c1 INT, c2 INT);
Query OK, 0 rows affected
obclient>CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected
obclient>CREATE TABLE t3 (c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected
Q1:
obclient>SELECT t1.c1, v.c1
FROM t1, (SELECT t2.c1, t3.c2
FROM t2, t3
WHERE t2.c1 = t3.c1) v
WHERE t1.c2 = v.c2;
<==>
Q2:
obclient>SELECT t1.c1, t2.c1
FROM t1, t2, t3
WHERE t2.c1 = t3.c1 AND t1.c2 = t3.c2;
如果 Q1 不进行改写,其联接顺序有以下几种:
t1,v(t2,t3)t1,v(t3,t2)v(t2,t3),t1v(t3,t2),t1
进行视图合并改写后,可选择的联接顺序有:
t1,t2,t3t1,t3,t2t2,t1,t3t2,t3,t1t3,t1,t2t3,t2,t1
由此可以得出,视图合并增加了联接顺序的选择性。对于复杂查询,视图合并后,对路径的选择和可改写的空间均会增大,从而使优化器生成更优的计划。
子查询展开
子查询展开是将 WHERE 条件中子查询提升到父查询中,并作为联接条件与父查询并列进行展开。转换后子查询将不存在,外层父查询中会变成多表联接。
这样改写的好处是优化器在进行路径选择、联接方法和联接排序时都会考虑到子查询中的表,从而可以获得更优的执行计划。涉及的子查询表达式一般有 NOT IN、IN、NOT EXIST、EXIST、ANY 和 ALL。
子查询展开的方式如下:
改写条件使生成的联接语句能够返回与原始语句相同的行。
展开为半联接(Semi Join/Anti Join)
如下例所示,
t2.c2不具有唯一性,改写为 Semi Join,该语句改写后的执行计划如下:obclient> CREATE TABLE t1 (c1 INT, c2 INT); Query OK, 0 rows affected obclient> CREATE TABLE t2 (c1 INT PRIMARY KEY, c2 INT); Query OK, 0 rows affected obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G *************************** 1. row *************************** Query Plan: ======================================= |ID|OPERATOR |NAME|EST. ROWS|COST| --------------------------------------- |0 |HASH SEMI JOIN| |495 |3931| |1 | TABLE SCAN |t1 |1000 |499 | |2 | TABLE SCAN |t2 |1000 |433 | ======================================= Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter(nil), equal_conds([t1.c1 = t2.c2]), other_conds(nil) 1 - output([t1.c1], [t1.c2]), filter(nil), access([t1.c1], [t1.c2]), partitions(p0) 2 - output([t2.c2]), filter(nil), access([t2.c2]), partitions(p0)将查询前面的操作符改为
NOT IN后,可改写为 Anti Join,具体计划如下例所示:obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c2 FROM t2)\G *************************** 1. row *************************** Query Plan: ================================================ |ID|OPERATOR |NAME|EST. ROWS|COST | ------------------------------------------------ |0 |NESTED-LOOP ANTI JOIN| |0 |520245| |1 | TABLE SCAN |t1 |1000 |499 | |2 | TABLE SCAN |t2 |22 |517 | ================================================ Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter(nil), conds(nil), nl_params_([t1.c1], [(T_OP_IS, t1.c1, NULL, 0)]) 1 - output([t1.c1], [t1.c2], [(T_OP_IS, t1.c1, NULL, 0)]), filter(nil), access([t1.c1], [t1.c2]), partitions(p0) 2 - output([t2.c2]), filter([(T_OP_OR, ? = t2.c2, ?, (T_OP_IS, t2.c2, NULL, 0))]), access([t2.c2]), partitions(p0)子查询展开为内联接
在上述示例的查询 Q1 中,如果将
t2.c2改为t2.c1,由于t2.c1为主键,子查询输出具有唯一性,此时可以直接转换为内联接,如下所示:Q1: obclient> SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2)\G; <==> Q2: obclient> SELECT t1.* FROM t1, t2 WHERE t1.c1 = t2.c1;Q1 改写后的计划如下所示:
obclient> EXPLAIN SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c1 FROM t2)\G *************************** 1. row *************************** Query Plan: ==================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ------------------------------------ |0 |HASH JOIN | |1980 |3725| |1 | TABLE SCAN|t2 |1000 |411 | |2 | TABLE SCAN|t1 |1000 |499 | ==================================== Outputs & filters: ------------------------------------- 0 - output([t1.c1], [t1.c2]), filter(nil), equal_conds([t1.c1 = t2.c1]), other_conds(nil) 1 - output([t2.c1]), filter(nil), access([t2.c1]), partitions(p0) 2 - output([t1.c1], [t1.c2]), filter(nil), access([t1.c1], [t1.c2]), partitions(p0)对于
NOT IN、IN、NOT EXIST、EXIST、ANY和ALL子查询表达式都可以对应做类似的改写操作。
ANY/ALL 使用 MAX/MIN 改写
对于 ANY/ALL 的子查询,如果子查询中没有 GROUP BY 子句、聚集函数和 HAVING 条件时,以下表达式可以使用聚集函数 MIN/MAX 进行等价转换,其中 col_item 为单独列且有非 NULL 属性:
val > ALL(SELECT col_item ...) <==> val > (SELECT MAX(col_item) ...);
val >= ALL(SELECT col_item ...) <==> val >= (SELECT MAX(col_item) ...);
val < ALL(SELECT col_item ...) <==> val < (SELECT MIN(col_item) ...);
val <= ALL(SELECT col_item ...) <==> val <= (SELECT MIN(col_item) ...);
val > ANY(SELECT col_item ...) <==> val > (SELECT MIN(col_item) ...);
val >= ANY(SELECT col_item ...) <==> val >= (SELECT MIN(col_item) ...);
val < ANY(SELECT col_item ...) <==> val < (SELECT MAX(col_item) ...);
val <= ANY(SELECT col_item ...) <==> val <= (SELECT MAX(col_item) ...);
将子查询更改为含有 MAX/MIN 的子查询后,再结合使用 MAX/MIN 进行改写,可减少改写前对内表的多次扫描,如下例所示:
obclient> SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2);
<==>
obclient> SELECT c1 FROM t1 WHERE c1 > (SELECT MIN(c1) FROM t2);
结合 MAX/MIN 进行改写后,可利用 t2.c1 的主键序将 LIMIT 1 直接下压到 TABLE SCAN,将 MIN 值输出,执行计划如下所示:
obclient> EXPLAIN SELECT c1 FROM t1 WHERE c1 > ANY(SELECT c1 FROM t2)\G
*************************** 1. row ***************************
Query Plan:
===================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
---------------------------------------------------
|0 |SUBPLAN FILTER | |1 |73 |
|1 | TABLE SCAN |t1 |1 |37 |
|2 | SCALAR GROUP BY| |1 |37 |
|3 | SUBPLAN SCAN |subquery_table|1 |37 |
|4 | TABLE SCAN |t2 |1 |36 |
===================================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1]), filter([t1.c1 > ANY(subquery(1))]),
exec_params_(nil), onetime_exprs_(nil), init_plan_idxs_([1])
1 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0)
2 - output([T_FUN_MIN(subquery_table.c1)]), filter(nil),
group(nil), agg_func([T_FUN_MIN(subquery_table.c1)])
3 - output([subquery_table.c1]), filter(nil),
access([subquery_table.c1])
4 - output([t2.c1]), filter(nil),
access([t2.c1]), partitions(p0),
limit(1), offset(nil)




