外联接消除
外联接操作可分为左外联接、右外联接和全外联接。在联接过程中,由于外联接左右顺序不能变换,优化器对联接顺序的选择会受到限制。外联接消除是指将外联接转换成内联接,从而可以提供更多可选择的联接路径,供优化器使用。
如果进行外联接消除,需要存在"空值拒绝条件",即在 WHERE 条件中存在,当内表生成的值为 NULL 时,输出为 FALSE 的条件。
如下例所示:
obclient>SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2;
这是一个外联接,在其输出行中 t2.c2 可能为 NULL。如果加上一个条件 t2.c2 > 5,则通过该条件过滤后,t1.c1 输出不可能为 NULL, 从而可以将外联接转换为内联接。
obclient> SELECT t1.c1, t2.c2 FROM t1 LEFT JOIN t2 ON t1.c2 = t2.c2 WHERE t2.c2 > 5;
<==>
obclient> SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c2 = t2.c2
WHERE t2.c2 > 5;
简化条件改写
HAVING 条件消除
如果查询中没有聚集操作和 GROUP BY,则 HAVING 可以合并到 WHERE 条件中,并将 HAVING 条件删除, 从而可以将 HAVING 条件在 WHERE 条件中统一管理,并做进一步优化。
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1;
<==>
obclient>SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 AND t1.c2 > 1;
改写后计划如下例所示, t1.c2 > 1 条件被下压到了 TABLE SCAN 层。
obclient> EXPLAIN SELECT * FROM t1, t2 WHERE t1.c1 = t2.c1 HAVING t1.c2 > 1\G
*************************** 1. row ***************************
Query Plan:
=========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------------
|0 |NESTED-LOOP JOIN| |1 |59 |
|1 | TABLE SCAN |t1 |1 |37 |
|2 | TABLE GET |t2 |1 |36 |
=========================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2], [t2.c1], [t2.c2]), filter(nil),
conds(nil), nl_params_([t1.c1])
1 - output([t1.c1], [t1.c2]), filter([t1.c2 > 1]),
access([t1.c1], [t1.c2]), partitions(p0)
2 - output([t2.c1], [t2.c2]), filter(nil),
access([t2.c1], [t2.c2]), partitions(p0)
等价关系推导
等价关系推导是指利用比较操作符的传递性,推倒出新的条件表达式,从而减少需要处理的行数或者选择到更有效的索引。
OceanBase 数据库可对等值联接进行推导,例如,一张表中有 a 列和 b 列,通过 a = b AND a > 1 可以推导出 a = b AND a > 1 AND b > 1, 如果 b 列上有索引,且 b > 1 在该索引选择率很低,则可以大大提升访问 b 列所在表的性能。
如下例所示,条件 t1.c1 = t2.c2 AND t1.c1 > 2,等价推导后为 t1.c1 = t2.c2 AND t1.c1 > 2 AND t2.c2 > 2,从计划中可以看到 t2.c2 已下压到 TABLE SCAN,并且使用了 t2.c2 对应的索引。
obclient> CREATE TABLE t1(c1 INT PRIMARY KEY, c2 INT);
Query OK, 0 rows affected
obclient> CREATE TABLE t2(c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2(c2));
Query OK, 0 rows affected
/*此命令需运行于 MySQL 模式下*/
obclient> EXPLAIN EXTENDED_NOADDR SELECT t1.c1, t2.c2 FROM t1, t2
WHERE t1.c1 = t2.c2 AND t1.c1 > 2\G
*************************** 1. row ***************************
Query Plan:
==========================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
------------------------------------------
|0 |MERGE JOIN | |5 |78 |
|1 | TABLE SCAN|t2(IDX_c2)|5 |37 |
|2 | TABLE SCAN|t1 |3 |37 |
==========================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t2.c2]), filter(nil),
equal_conds([t1.c1 = t2.c2]), other_conds(nil)
1 - output([t2.c2]), filter(nil),
access([t2.c2]), partitions(p0),
is_index_back=false,
range_key([t2.c2], [t2.c1]), range(2,MAX ; MAX,MAX),
range_cond([t2.c2 > 2])
2 - output([t1.c1]), filter(nil),
access([t1.c1]), partitions(p0),
is_index_back=false,
range_key([t1.c1]), range(2 ; MAX),
range_cond([t1.c1 > 2])
恒真/假消除
对于如下恒真恒假条件可以进行消除:
false and expr= 恒 Falsetrue or expr= 恒 True
如下例所示,对于 WHERE 0 > 1 AND c1 = 3,由于 0 > 1 使得 AND 恒假, 所以该 SQL 不用执行,可直接返回,从而加快查询的执行。
obclient> EXPLAIN EXTENDED_NOADDR SELECT * FROM t1 WHERE 0 > 1 AND c1 = 3\G
*************************** 1. row ***************************
Query Plan:
===================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
-----------------------------------
|0 |TABLE SCAN|t1 |0 |38 |
===================================
Outputs & filters:
-------------------------------------
0 - output([t1.c1], [t1.c2]), filter([0], [t1.c1 = 3]), startup_filter([0]),
access([t1.c1], [t1.c2]), partitions(p0),
is_index_back=false, filter_before_indexback[false,false],
range_key([t1.__pk_increment], [t1.__pk_cluster_id], [t1.__pk_partition_id]),
range(MAX,MAX,MAX ; MIN,MIN,MIN)always false



