Common Table Expression

分布式数据库爱好者 2022-10-12

469

SQL99标准中定义了common table expression，被简称CTE。作为一项重要的功能，主流商业数据库都对其进行了支持。目前，OceanBase已经对CTE的Subquery Factoring和Recursive Subquery Factoring功能进行了支持，对标Oracle。

subquery factoring
贴上例子：

#语句0-1 with ctable as (select * from t1) select c1 from ctable where c1 = 5;
#语句0-2 with ctable1 (a,b,c) as (select * from t1) select a from ctable1 where a = 5;
#语句1 select c1 from (select * from t1) table where c1 = 5;
语句0-1是一个最简单的cte写法，定义了一个叫做ctable的表，其实也就是t1表；语句0-2在语句0-1的基础上，指定了ctable的列名。从效果上来说，语句0-1与语句1是等效。

如下所示一条SQL，我们将语句2-1中CTE定义的表展开到主查询中，得到了语句2-2。

#语句2-1 WITH
dept_costs AS (
SELECT dname, SUM(sal) dept_total
FROM emp e, dept d
WHERE e.deptno = d.deptno
GROUP BY dname),
avg_cost AS (
SELECT SUM(dept_total)/COUNT(*) avg
FROM dept_costs) SELECT * FROM dept_costs WHERE dept_total > (SELECT avg FROM avg_cost) ORDER BY dname;

#语句2－2 SELECT * FROM (
SELECT dname, SUM(sal) dept_total
FROM emp e, dept d
WHERE e.deptno = d.deptno
GROUP BY dname) dept_costs WHERE dept_total > (SELECT avg FROM (
SELECT SUM(dept_total)/COUNT(*) avg
FROM (
SELECT dname, SUM(sal) dept_total
FROM emp e, dept d
WHERE e.deptno = d.deptno
GROUP BY dname) dept_costs
) avg_cost) ORDER BY dname;
对比语句2-2，语句2-1提取了公共子查询，SQL语句看起来非常清晰。如果CTE的表名字定义得当，你甚至可以不去读with定义的部分就可以知道这条SQL语句具体的功能是什么，这是CTE最直观的一个优势。
具体的来说，CTE的引入可以带来的好处：

简化SQL语句的编写，不必多次重写相同的子查询；
子查询被抽取出来之后给优化器提供了新的优化选择；
最重要的是，使得SQL语句的主干部分逻辑清晰；

recursive subquery factoring

除了常规的写法，oracle还支持递归的写法。例子如下：

#语句3-1 with rctable (rcid, rcname, rcleaderid) as
(
select id, name, leaderid from emp where emp.id = 1
union all
select emp.id , emp.name, emp.leaderid from emp, rctable where emp.leaderid = rctable.rcid
) search depth first by rcid asc set num cycle rcid set iscyc to ‘y’ default ‘n’ select * from rctable;
上面的sql语句中，rctable是一张通过递归的CTE表。rctable表示在emp.id为1的行开始查询，获得其下属的行，并最终进行展示。执行结果如下所示。

±-----±-------±-----------±-----±------+
| rcid | rcname | rcleaderid | num | iscyc |
±-----±-------±-----------±-----±------+
| 1 | A | 0 | 1 | n |
| 2 | AA | 1 | 2 | n |
| 5 | AAA | 2 | 3 | n |
| 7 | AAA | 5 | 4 | n |
| 8 | AAA | 7 | 5 | n |
| 9 | AAAA | 5 | 6 | n |
| 10 | AAAB | 5 | 7 | n |
| 11 | AAAC | 5 | 8 | n |
| 12 | AAAA | 5 | 9 | n |
| 3 | AB | 1 | 10 | n |
| 4 | ABA | 3 | 11 | n |
| 6 | ABB | 3 | 12 | n |
±-----±-------±-----------±-----±------+
recursive subquery factoring的本质是深度或者广度优先的搜索，经过仔细的编写，你可以通过它实现很多有趣的功能。简单的，你可以实现阶乘，求最小公倍数；复杂的，你甚至可以用它来求数独的解，感兴趣的同学可以自行搜索一下。下面贴出了一个OceanBase的解数独的SQL语句以及其执行结果（使用mysql 8.0或者以上的同学，运行时请加上recursive关键字）。

with x( s, ind, lev ) as ( select sud, locate( ’ ‘,sud ), 1
from ( select ‘53 7 6 195 98 6 8 6 34 8 3 17 2 6 6 28 419 5 8 79’ sud from dual ) as no1
union all
select concat(substring( s, 1, ind - 1 ) , concat( z.zc , substring( s, ind + 1 ))), locate(’ ', s, ind+1) , lev + 1
from x, z where ind > 0 and not exists ( select null from ( select zc lp from z ) as no2
where z.zc = substring( s, if((floor( ( ind - 1 ) / 9 ) * 9 + lp)=0, 1, (floor( ( ind - 1 ) / 9 ) * 9 + lp)), 1 )
or z.zc = substring( s, if((mod( ind - 1, 9 ) - 8 + lp * 9)=0, 1, (mod( ind - 1, 9 ) - 8 + lp * 9)), 1 )
or z.zc = substring( s, if((mod( floor( ( ind - 1 ) / 3), 3 ) * 3 + floor( ( ind - 1 ) / 27) * 27 + lp + floor( ( lp - 1 ) / 3) * 6)=0, 1, (mod( floor( ( ind - 1 ) / 3), 3 ) * 3 + floor( ( ind - 1 ) / 27) * 27 + lp + floor( ( lp - 1 ) / 3) * 6)) , 1 ) ) ) select s, ind, lev from x where ind = 0; ±----------------------------------------------------------------------------------±----±----+ | s | ind | lev | ±----------------------------------------------------------------------------------±----±----+ | 534678912672195348198342567859761423426853791713924856961537284287419635345286179 | 0 | 52 | ±----------------------------------------------------------------------------------±----±----+

OceanBase的实现
OceanBase以兼容Oracle 11.2版本的CTE为目标，对Subquery Factoring和Recursive Subquery Factoring功能进行了第一个版本的支持。

对Subquery Factoring来说，我们直接将其多次展开到主查询中。这种方式的好处是简单，方便，不需要对优化器做任何的修改；坏处是多次展开使得效率并没有得到提高，没有利用到CTE的特点。鉴于优化器对CTE进行支持有着不小的工作量和复杂性，以功能为核心第一个版本暂且搁置对优化器进行更改是合理的。在后续的迭代版本中，我们将会着重从优化器的视角来完善对CTE的支持。

对Recursive Subquery Factoring来说，对于语句3-1，Oracle的执行计划如下所示。我们关注的重点在于2号算子（姑且简称为R-union）算子，和5号算子（姑且简称为R-pump算子）。4，5，6号算子构成了recursive member的子计划树，3号算子则视为anchor member的子计划树。根据显示的计划，我们推测R-union算子先执行左支，获得第一次中间结果，而后再执行右枝；R-union算子是递归执行的控制者，包括递归的执行方式，暂存中间结果，判断数据是否成环，递归是否结束，伪列的值填充都由它来把控。R-pump算子负责将特定的中间结果传递给recursive member子计划树，使得他能够在新的数据上重新执行。

计划1

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 159 | 11 (19)| 00:00:01 |
| 1 | VIEW | | 3 | 159 | 11 (19)| 00:00:01 |
| 2 | UNION ALL (RECURSIVE WITH) DEPTH FIRST| | | | | |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 10 | 3 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 2 | 46 | 7 (15)| 00:00:01 |
| 5 | RECURSIVE WITH PUMP | | | | | |
| 6 | TABLE ACCESS FULL | EMP | 12 | 120 | 3 (0)| 00:00:01 |

同样的一条语句，OceanBase的执行计划如下。我们的实现与Oracle相似，但是对于排序的操作我们单独抽出了排序逻辑，为优化器消除排序预留窗口。0号RECURSIVE UNION ALL算子作为R-union的角色，而6号PUMP与R-pump算子作用一致。

计划2

|ID|OPERATOR |NAME |

|0 |RECURSIVE UNION ALL| |
|1 | SORT | |
|2 | TABLE SCAN |emp |
|3 | SORT | |
|4 | HASH INNER JOIN | |
|5 | TABLE SCAN |emp |
|6 | PUMP |rctable|

就功能来说，我们的目标是实现Oracle的超集，例如，Oracle中的cycle clause中，只能是单字节字符串值，OceanBase实现时则将其拓展成了多字节字符串。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Common Table Expression

计划1

计划2

|ID|OPERATOR |NAME |

|0 |RECURSIVE UNION ALL| | |1 | SORT | | |2 | TABLE SCAN |emp | |3 | SORT | | |4 | HASH INNER JOIN | | |5 | TABLE SCAN |emp | |6 | PUMP |rctable|

评论

|0 |RECURSIVE UNION ALL| |
|1 | SORT | |
|2 | TABLE SCAN |emp |
|3 | SORT | |
|4 | HASH INNER JOIN | |
|5 | TABLE SCAN |emp |
|6 | PUMP |rctable|