暂无图片
暂无图片
4
暂无图片
暂无图片
暂无图片

OceanBase管理数据库基于规则查询改写(三)

2023-07-24
763

非 SPJ 的改写

冗余排序消除

冗余排序消除是指删除 Order Item 中不需要的项,减少排序开销。以下三种情况可进行排序消除:

  • ORDER BY 表达式列表中有重复列,可进行去重后排序。

    obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c1, c2, c3 ;
    <==>
    obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
    
  • ORDER BY 列中存在 WHERE 中有单值条件的列,该列排序可删除。

    obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c2, c3;
    <==>
    obclient> SELECT * FROM t1 WHERE c2 = 5 ORDER BY c1, c3;
    
  • 如果本层查询有 ORDER BY 但是没有 LIMIT,且本层查询位于父查询的集合操作中,则 ORDER BY 可消除。因为对两个有序的集合做 UNION 操作,其结果是乱序的。但是如果 ORDER BY 中有 LIMIT,则语义是取最大/最小的 N 个,此时不能消除 ORDER BY,否则有语义错误。

    obclient> (SELECT c1,c2 FROM t1 ORDER BY c1) UNION (SELECT c3,c4 FROM t2 ORDER BY c3);
    <==>
    obclient> (SELECT c1,c2 FROM t1) UNION (SELECT c3,c4 FROM t2);
    

LIMIT 下压

LIMIT 下压改写是指将 LIMIT 下降到子查询中,OceanBase 数据库现在支持在不改变语义的情况下,将 LIMIT 下压到视图(示例 1)或 UNION 对应的子查询(示例 2)中。

示例 1:将 LIMIT 下压到视图。

obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1) a LIMIT 1; 
<==>
obclient> SELECT * FROM (SELECT * FROM t1 ORDER BY c1 LIMIT 1) a LIMIT 1;

示例 2:将 LIMIT 下压到 UNION 对应的子查询中。

obclient> (SELECT c1,c2 FROM t1) UNION ALL (SELECT c3,c4 FROM t2) LIMIT 5;
<==>
obclient> (SELECT c1,c2 FROM t1 LIMIT 5) UNION ALL (SELECT c3,c4 FROM t2 limit 5) LIMIT 5;

DISTINCT 消除

  • 如果 Select Item 中只包含常量,则可以消除 DISTINCT,并加上 LIMIT 1

    obclient> SELECT DISTINCT 1,2 FROM t1 ;
    <==> 
    obclient> SELECT 1,2 FROM t1 LIMIT 1;
    
    obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT);
    Query OK, 0 rows affected 
    
    obclient> EXPLAIN EXTENDED_NOADDR SELECT DISTINCT 1,2 FROM t1\G
    *************************** 1. row ***************************
    Query Plan: 
    ===================================
    |ID|OPERATOR  |NAME|EST. ROWS|COST|
    -----------------------------------
    |0 |TABLE SCAN|t1  |1        |36  |
    ===================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([1], [2]), filter(nil),
          access([t1.c1]), partitions(p0),
          limit(1), offset(nil),
          is_index_back=false,
          range_key([t1.c1]), range(MIN ; MAX)always true
    
  • 如果 Select Item 中包含确保唯一性约束的列,则 DISTINCT 能够被消除。如下示例中 (c1, c2) 为主键,可确保 c1c2 和 c3 唯一性, 从而 DISTINCT 可消除。

    obclient> CREATE TABLE t2(c1 INT, c2 INT, c3 INT, PRIMARY KEY(c1, c2));
    Query OK, 0 rows affected 
    
    obclient> SELECT DISTINCT c1, c2, c3 FROM t2;
    <==>
    obclient> SELECT c1, c2 c3 FROM t2;
    
    obclient> EXPLAIN SELECT DISTINCT c1, c2, c3 FROM t2\G
    *************************** 1. row ***************************
    Query Plan: 
    ===================================
    |ID|OPERATOR  |NAME|EST. ROWS|COST|
    -----------------------------------
    |0 |TABLE SCAN|t2  |1000     |455 |
    ===================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([t2.c1], [t2.c2], [t2.c3]), filter(nil),
          access([t2.c1], [t2.c2], [t2.c3]), partitions(p0)
    

MIN/MAX 改写

  • 当 MIN/MAX 函数中的参数为索引前缀列,且不含 GROUP BY 时,可将该 scalar aggregate 转换为走索引扫描 1 行的情况,如下例所示:

    obclient> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT, c3 INT, KEY IDX_c2_c3(c2,c3));
    Query OK, 0 rows affected 
    
    obclient> SELECT MIN(c2) FROM t1;
    <==>
    obclient> SELECT MIN(c2) FROM (SELECT c2 FROM t2 ORDER BY c2 LIMIT 1) AS t;
    
    obclient> EXPLAIN SELECT MIN(c2) FROM t1\G
    *************************** 1. row ***************************
    Query Plan: 
    ==================================================
    |ID|OPERATOR       |NAME          |EST. ROWS|COST|
    --------------------------------------------------
    |0 |SCALAR GROUP BY|              |1        |37  |
    |1 | SUBPLAN SCAN  |subquery_table|1        |37  |
    |2 |  TABLE SCAN   |t1(idx_c2_c3) |1        |36  |
    ==================================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([T_FUN_MIN(subquery_table.c2)]), filter(nil),
          group(nil), agg_func([T_FUN_MIN(subquery_table.c2)])
      1 - output([subquery_table.c2]), filter(nil),
          access([subquery_table.c2])
      2 - output([t1.c2]), filter([(T_OP_IS_NOT, t1.c2, NULL, 0)]),
          access([t1.c2]), partitions(p0),
          limit(1), offset(nil)
    
  • 如果 SELECT MIN/SELECT MAX 的参数为常量,而且包含 GROUP BY,可以将 MIN/MAX 改为常量,从而减少 MIN/MAX 的计算开销。

    obclient> SELECT MAX(1) FROM t1 GROUP BY c1;
    <==>
    obclient> SELECT 1 FROM t1 GROUP BY c1;
    
    obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1 GROUP BY c1\G
    *************************** 1. row ***************************
    Query Plan: 
    ===================================
    |ID|OPERATOR  |NAME|EST. ROWS|COST|
    -----------------------------------
    |0 |TABLE SCAN|t1  |1000     |411 |
    ===================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([1]), filter(nil),
          access([t1.c1]), partitions(p0),
          is_index_back=false,
          range_key([t1.c1]), range(MIN ; MAX)always true
    
  • 如果 SELECT MIN/SELECT MAX 的参数为常量,而且不含 GROUP BY,可以按照如下示例进行改写,从而走索引只需扫描 1 行。

    obclient> SELECT MAX(1) FROM t1;
    <==> 
    obclient> SELECT MAX(t.a) FROM (SELECT 1 AS a FROM t1 LIMIT 1) t;
    
    obclient> EXPLAIN EXTENDED_NOADDR SELECT MAX(1) FROM t1\G
    *************************** 1. row ***************************
    Query Plan: 
    ==================================================
    |ID|OPERATOR       |NAME          |EST. ROWS|COST|
    --------------------------------------------------
    |0 |SCALAR GROUP BY|              |1        |37  |
    |1 | SUBPLAN SCAN  |subquery_table|1        |37  |
    |2 |  TABLE SCAN   |t1            |1        |36  |
    ==================================================
    
    Outputs & filters:
    -------------------------------------
      0 - output([T_FUN_MAX(subquery_table.subquery_col_alias)]), filter(nil),
          group(nil), agg_func([T_FUN_MAX(subquery_table.subquery_col_alias)])
      1 - output([subquery_table.subquery_col_alias]), filter(nil),
          access([subquery_table.subquery_col_alias])
      2 - output([1]), filter(nil),
          access([t1.c1]), partitions(p0),
          limit(1), offset(nil),
          is_index_back=false,
          range_key([t1.c1]), range(MIN ; MAX)always true
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论