非 SPJ 的改写
冗余排序消除
冗余排序消除是指删除 Order Item 中不需要的项,减少排序开销。以下三种情况可进行排序消除:
ORDER BY表达式列表中有重复列,可进行去重后排序。obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ; <==> obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;ORDER BY列中存在WHERE中有单值条件的列,该列排序可删除。obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3; <==> obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;如果本层查询有
ORDER BY但是没有LIMIT,且本层查询位于父查询的集合操作中,则ORDER BY可消除。因为对两个有序的集合做UNION操作,其结果是乱序的。但是如果ORDER BY中有LIMIT,则语义是取最大/最小的 N 个,此时不能消除ORDER BY,否则有语义错误。obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3); <==> obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
LIMIT 下压
LIMIT 下压改写是指将 LIMIT 下降到子查询中,OceanBase 数据库现在支持在不改变语义的情况下,将 LIMIT 下压到视图(示例 1)或 UNION 对应的子查询(示例 2)中。
示例 1:将 LIMIT 下压到视图。
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1;
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;
示例 2:将 LIMIT 下压到 UNION 对应的子查询中。
obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;
DISTINCT 消除
如果 Select Item 中只包含常量,则可以消除
DISTINCT,并加上LIMIT 1。obclient> SELECT DISTINCT 1,2 FROM t1 ; <==> obclient> SELECT 1,2 FROM t1 LIMIT 1; obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT); Query OK, 0 rows affected obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1\G *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |1 |36 | =================================== Outputs & filters: ------------------------------------- 0 - output([1], [2]), filter(nil), access([t1.c1]), partitions(p0), limit(1), offset(nil), is_index_back=false, range_key([t1.c1]), range(MIN ; MAX)always true如果 Select Item 中包含确保唯一性约束的列,则
DISTINCT能够被消除。如下示例中(c1, c2)为主键,可确保c1、c2和c3唯一性, 从而DISTINCT可消除。obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2)); Query OK, 0 rows affected obclient> SELECT DISTINCT c1, c2, c3 FROM t2; <==> obclient> SELECT c1, c2 c3 FROM t2; obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2\G *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t2 |1000 |455 | =================================== Outputs & filters: ------------------------------------- 0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil), access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)
MIN/MAX 改写
当
MIN/MAX函数中的参数为索引前缀列,且不含GROUP BY时,可将该scalar aggregate转换为走索引扫描 1 行的情况,如下例所示:obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3)); Query OK, 0 rows affected obclient> SELECT MIN(c2) FROM t1; <==> obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t2 ORDER BY c2 LIMIT 1) AS t; obclient> EXPLAIN SELECT MIN(c2) FROM t1\G *************************** 1. row *************************** Query Plan: ================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------- |0 |SCALAR GROUP BY| |1 |37 | |1 | SUBPLAN SCAN |subquery_table|1 |37 | |2 | TABLE SCAN |t1(idx_c2_c3) |1 |36 | ================================================== Outputs & filters: ------------------------------------- 0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil), group(nil), agg_func([T_FUN_MIN(subquery_table.c2)]) 1 - output([subquery_table.c2]), filter(nil), access([subquery_table.c2]) 2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]), access([t1.c2]), partitions(p0), limit(1), offset(nil)如果
SELECT MIN/SELECT MAX的参数为常量,而且包含GROUP BY,可以将MIN/MAX改为常量,从而减少MIN/MAX的计算开销。obclient> SELECT MAX(1) FROM t1 GROUP BY c1; <==> obclient> SELECT 1 FROM t1 GROUP BY c1; obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1\G *************************** 1. row *************************** Query Plan: =================================== |ID|OPERATOR |NAME|EST. ROWS|COST| ----------------------------------- |0 |TABLE SCAN|t1 |1000 |411 | =================================== Outputs & filters: ------------------------------------- 0 - output([1]), filter(nil), access([t1.c1]), partitions(p0), is_index_back=false, range_key([t1.c1]), range(MIN ; MAX)always true如果
SELECT MIN/SELECT MAX的参数为常量,而且不含GROUP BY,可以按照如下示例进行改写,从而走索引只需扫描 1 行。obclient> SELECT MAX(1) FROM t1; <==> obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t; obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1\G *************************** 1. row *************************** Query Plan: ================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------- |0 |SCALAR GROUP BY| |1 |37 | |1 | SUBPLAN SCAN |subquery_table|1 |37 | |2 | TABLE SCAN |t1 |1 |36 | ================================================== Outputs & filters: ------------------------------------- 0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil), group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)]) 1 - output([subquery_table.subquery_col_alias]), filter(nil), access([subquery_table.subquery_col_alias]) 2 - output([1]), filter(nil), access([t1.c1]), partitions(p0), limit(1), offset(nil), is_index_back=false, range_key([t1.c1]), range(MIN ; MAX)always true




