国产集中库SQL能力评测 - 排序和分组聚合

韩锋频道 2024-12-26

141

这里谈谈排序与分组聚合。这两类操作往往存在一定相关性。

【系列文章】

国产集中库SQL能力评测 - 访问路径

国产集中库SQL能力评测 - 表连接

国产集中库SQL能力评测 - 子查询

1. 排序

排序，是数据库内比较消耗资源的一类操作，特别是在结果比较大的情况下。因此在数据库处理上，应尽量规避排序的行为。这里讲的排序，不仅仅是指 ORDER BY 的操作，很多操作都会引发排序行为。常见操作包括：生成索引的操作（因为索引是有序结构）、某些SQL（如带有DISTINCT、ORDER BY、GROUP BY、UNION、MINUS、INTERSET、CONNECT BY和CONNECT BY ROLLUP子句）、排序合并连接（两个结果集排序后关联）、收集统计信息、其他如位图变换、分析函数等。在上面这些操作中，有些是为了进行排序，有些是为了其他目的（如去重等）；因此数据库是可以考虑优化此类排序行为的。例如Oracle数据库，在10g以前的版本是通过SORT GROUP BY完成分组的，但在10g之后默认提供了HASH GROUP BY，这样效率更高，当然其结果集不保证有序了。

1).常见排序操作

下面针对常见的排序类操作，抽象出一组测例，看看Oracle和国产库的行为如何。

❖ SORT UNIQUE

排序去重类，把查询语句的输出结果变为唯一集合的过程。使用场景包括：语句中使用了DISTINCT、子查询向主查询提供执行结果。在Oracle 10g以后的版本中，SORT UNIQUE 变成 HASH UNIQUE，利用新的HASH算法代替了传统的排序。但在使用子查询的场景下，因为优先执行子查询，子查询放在主查询之前。由于主查询的结果必须存在于子查询的结果中。在这里要将作为"M"集合的子查询转换为不允许重复元素存在的"1"集合，所以执行了SORT(UNIQUE)。

❖ SORT AGGREGATE

这是指在没有 GROUP BY 的前提下，使用统计函数对全部数据对象进行计算时所显示出来的执行计划。在使用SUM、COUNT、MIN、MAX、AVG等统计函数时并不执行一般排序动作。实际上是读取每一行数据为对象进行求和、计数、比较大小等操作，可通过一个全局变量+全表/全索引扫描来实现。

❖ SORT GROUP BY

该操作是将数据行向不同分组中聚集的操作，即依据查询语句中所使用的GROUP BY而进行相关操作，为了进行分组就只能排序。需要分组的数据量越大，代价就越大。在10gR2以后的版本中，哈希分组-HASH (GROUPBY)。在处理海量数据时使用哈希处理比使用排序处理更有效。

❖ SORT ORDER BY

当对一个不能满足索引列进行排序时，就需要一个SORT ORDER BY。这里可以有个优化，针对取出排序结果前几条的场景，是可以提前结束排序动作，节省资源。

❖ SORT JOIN

在表关联的场景中，如果行按照连接键排序，在排序合并连接时将会发生SORT JOIN。SORT JOIN 发生在出现MERGE JOIN的情况下，两张关联的表要各自做SORT，然后在MERGE。

2).Oracle 测试示例

-- SORT UNIQUE
SQL> explain plan for select distinct dept_id,emp_name from emp where salary<1100;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   898 | 14368 |    16   (7)| 00:00:01 |
|   1 |  HASH UNIQUE       |      |   898 | 14368 |    16   (7)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| EMP  |   898 | 14368 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

SQL> explain plan for  select * from dept where dept_id in (select dept_id from emp where salary<1100);
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   100 |  2300 |    18   (6)| 00:00:01 |
|   1 |  MERGE JOIN SEMI             |         |   100 |  2300 |    18   (6)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |   100 |  1500 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | DEPT_PK |   100 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT UNIQUE                |         |   898 |  7184 |    16   (7)| 00:00:01 |
|*  5 |    TABLE ACCESS FULL         | EMP     |   898 |  7184 |    15   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

-- SORT AGGREGATE
SQL> explain plan for select sum(salary) from emp;
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |     1 |     5 |    10   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |                |     1 |     5 |            |          |
|   2 |   INDEX FAST FULL SCAN| IDX_EMP_SALARY | 10000 | 50000 |    10   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

SQL> explain plan for select min(salary),max(salary) from emp;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |    15   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 | 50000 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

-- SORT GROUP BY
SQL> explain plan for select dept_id,count(*) from emp group by dept_id;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |   300 |    16   (7)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   100 |   300 |    16   (7)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 | 30000 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

-- SORT(ORDER BY)
SQL>  explain plan for select * from emp order by dept_id;
SQL>  select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   273K|    16   (7)| 00:00:01 |
|   1 |  SORT ORDER BY     |      | 10000 |   273K|    16   (7)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 |   273K|    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

SQL> explain plan for select * from (select * from emp order by salary desc) where rownum<=10;
SQL> select * from table(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |    10 |   650 |    16   (7)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |            |          |
|   2 |   VIEW                  |      | 10000 |   634K|    16   (7)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 10000 |   273K|    16   (7)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | EMP  | 10000 |   273K|    15   (0)| 00:00:01 |
--------------------------------------------------------------------------------

-- SORT JOIN
SQL> explain plan for select *+ use_merge(e d)*/ e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  | 10000 |   205K|    20  (15)|
|   1 |  MERGE JOIN              |                  | 10000 |   205K|    20  (15)|
|   2 |   SORT JOIN              |                  |   100 |  1000 |     4  (50)|
|   3 |    VIEW                  | index$_join$_002 |   100 |  1000 |     3  (34)|
|*  4 |     HASH JOIN            |                  |       |       |            |
|   5 |      INDEX FAST FULL SCAN| DEPT_PK          |   100 |  1000 |     1   (0)|
|   6 |      INDEX FAST FULL SCAN| IDX_DEPT_NAME    |   100 |  1000 |     1   (0)|
|*  7 |   SORT JOIN              |                  | 10000 |   107K|    16   (7)|
|   8 |    TABLE ACCESS FULL     | EMP              | 10000 |   107K|    15   (0)|
----------------------------------------------------------------------------------

3).国产库测试示例

下面是针对上述测例，国产库的行为如何？先来看看整体结果

❖ MySQL

-- SORT UNIQUE
mysql> explain select distinct dept_id,emp_name from emp where salary<1100;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                                  |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+----------------------------------------+
|  1 | SIMPLE      | emp   | NULL       | range | idx_emp_salary | idx_emp_salary | 5       | NULL |  459 |   100.00 | Using index condition; Using temporary |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+----------------------------------------+

mysql> explain select * from dept where dept_id in (select dept_id from emp where salary<1100);
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+------+----------+-----------------------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref                 | rows | filtered | Extra                 |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+------+----------+-----------------------+
|  1 | SIMPLE       | dept        | NULL       | ALL    | PRIMARY             | NULL                | NULL    | NULL                |  100 |   100.00 | NULL                  |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5       | testdb.dept.dept_id |    1 |   100.00 | NULL                  |
|  2 | MATERIALIZED | emp         | NULL       | range  | idx_emp_salary      | idx_emp_salary      | 5       | NULL                |  459 |   100.00 | Using index condition |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+------+----------+-----------------------+

-- SORT AGGREGATE
mysql> explain select sum(salary) from emp;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | emp   | NULL       | index | NULL          | idx_emp_salary | 5       | NULL | 10117 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+

mysql> explain select min(salary),max(salary) from emp;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

-- SORT GROUP BY
mysql> explain select dept_id,count(*) from emp group by dept_id;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10117 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+

-- SORT(ORDER BY)
mysql> explain select * from emp order by dept_id;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10117 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+

mysql> explain select * from (select * from emp order by salary desc) a limit 10;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows | filtered | Extra               |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+
|  1 | SIMPLE      | emp   | NULL       | index | NULL          | idx_emp_salary | 5       | NULL |   10 |   100.00 | Backward index scan |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+

-- SORT JOIN
mysql> explain select e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------+-------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref              | rows  | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------+-------+----------+-------------+
|  1 | SIMPLE      | e     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL             | 10117 |   100.00 | Using where |
|  1 | SIMPLE      | d     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | testdb.e.dept_id |     1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------+-------+----------+-------------+

❖ DM

-- SORT UNIQUE
SQL> explain select distinct dept_id,emp_name from emp where salary<1100;
1   #NSET2: [2, 885, 108]
2     #PRJT2: [2, 885, 108]; exp_num(2), is_atom(FALSE)
3       #DISTINCT: [2, 885, 108]
4         #BLKUP2: [1, 885, 108]; IDX_EMP_SALARY(EMP)
5           #SSEK2: [1, 885, 108]; scan_type(ASC), IDX_EMP_SALARY(EMP), scan_range(null2,exp_cast(1100)), is_global(0)

SQL> explain select * from dept where dept_id in (select dept_id from emp where salary<1100);
1   #NSET2: [1, 100, 138]
2     #PRJT2: [1, 100, 138]; exp_num(4), is_atom(FALSE)
3       #HASH LEFT SEMI JOIN2: [1, 100, 138]; KEY_NUM(1);  KEY(DEPT.DEPT_ID=EMP.DEPT_ID) KEY_NULL_EQU(0)
4         #CSCN2: [1, 100, 138]; INDEX33555480(DEPT); btr_scan(1)
5         #BLKUP2: [1, 885, 60]; IDX_EMP_SALARY(EMP)
6           #SSEK2: [1, 885, 60]; scan_type(ASC), IDX_EMP_SALARY(EMP), scan_range(null2,exp_cast(1100)), is_global(0)

-- SORT AGGREGATE
SQL> explain select sum(salary) from emp;
1   #NSET2: [1, 1, 30]
2     #PRJT2: [1, 1, 30]; exp_num(1), is_atom(FALSE)
3       #AAGR2: [1, 1, 30]; grp_num(0), sfun_num(1), distinct_flag[0]; slave_empty(0)
4         #SSCN: [1, 10000, 30]; IDX_EMP_SALARY(EMP); btr_scan(1); is_global(0)

SQL> explain select min(salary),max(salary) from emp;
1   #NSET2: [1, 1, 30]
2     #PRJT2: [1, 1, 30]; exp_num(2), is_atom(FALSE)
3       #FAGR2: [1, 1, 30]; sfun_num(2), IDX_EMP_SALARY

-- SORT GROUP BY
SQL> explain select dept_id,count(*) from emp group by dept_id;
1   #NSET2: [2, 100, 30]
2     #PRJT2: [2, 100, 30]; exp_num(2), is_atom(FALSE)
3       #HAGR2: [2, 100, 30]; grp_num(1), sfun_num(1), distinct_flag[0]; slave_empty(0) keys(EMP.DEPT_ID)
4         #CSCN2: [1, 10000, 30]; INDEX33555484(EMP); btr_scan(1)

-- SORT(ORDER BY)
SQL> explain select * from emp order by dept_id;
1   #NSET2: [2, 10000, 163]
2     #PRJT2: [2, 10000, 163]; exp_num(6), is_atom(FALSE)
3       #SORT3: [2, 10000, 163]; key_num(1), partition_key_num(0), is_distinct(FALSE), top_flag(0), is_adaptive(0)
4         #CSCN2: [1, 10000, 163]; INDEX33555484(EMP); btr_scan(1)

SQL> explain select * from (select * from emp order by salary desc) a limit 10;
1   #NSET2: [4, 10, 205]
2     #PRJT2: [4, 10, 205]; exp_num(6), is_atom(FALSE)
3       #SORT3: [4, 10, 205]; key_num(1), partition_key_num(0), is_distinct(FALSE), top_flag(0), is_adaptive(0)
4         #HASH2 INNER JOIN: [3, 10, 205]; LKEY_UNIQUE KEY_NUM(1); KEY(DMTEMPVIEW_889193478.colname=EMP.ROWID) KEY_NULL_EQU(0)
5           #NEST LOOP INDEX JOIN2: [3, 10, 205]
6             #ACTRL: [3, 10, 205]
7               #DISTINCT: [3, 10, 42]
8                 #PRJT2: [2, 10, 42]; exp_num(1), is_atom(FALSE)
9                   #PRJT2: [2, 10, 42]; exp_num(1), is_atom(FALSE)
10                    #SORT3: [2, 10, 42]; key_num(1), partition_key_num(0), is_distinct(FALSE), top_flag(1), is_adaptive(0)
11                      #SSCN: [1, 10000, 42]; IDX_EMP_SALARY(EMP); btr_scan(1); is_global(0)
12            #CSEK2: [1, 1, 0]; scan_type(ASC), INDEX33555484(EMP), scan_range[DMTEMPVIEW_889193478.colname,DMTEMPVIEW_889193478.colname]
13          #CSCN2: [1, 10000, 163]; INDEX33555484(EMP); btr_scan(1)

-- SORT JOIN
SQL> explain select e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
1   #NSET2: [2, 10000, 156]
2     #PRJT2: [2, 10000, 156]; exp_num(2), is_atom(FALSE)
3       #HASH2 INNER JOIN: [2, 10000, 156]; LKEY_UNIQUE KEY_NUM(1); KEY(D.DEPT_ID=E.DEPT_ID) KEY_NULL_EQU(0)
4         #CSCN2: [1, 100, 78]; INDEX33555480(DEPT as D); btr_scan(1)
5         #CSCN2: [1, 10000, 78]; INDEX33555484(EMP as E); btr_scan(1)

SQL> explain select *+ use_merge(e d)*/ e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
1   #NSET2: [638801, 10000, 156]
2     #PRJT2: [638801, 10000, 156]; exp_num(2), is_atom(FALSE)
3       #SLCT2: [638801, 10000, 156]; E.DEPT_ID = D.DEPT_ID
4         #NEST LOOP INNER JOIN2: [638801, 10000, 156]
5           #CSCN2: [1, 10000, 78]; INDEX33555484(EMP as E); btr_scan(1)
6           #CSCN2: [1, 100, 78]; INDEX33555480(DEPT as D); btr_scan(1)

❖ KingBase

-- SORT UNIQUE
TEST=# explain select distinct dept_id,emp_name from emp where salary<1100;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Unique  (cost=22893.03..23577.57 rows=91272 width=20)
   ->  Sort  (cost=22893.03..23121.21 rows=91272 width=20)
         Sort Key: dept_id, emp_name
         ->  Bitmap Heap Scan on emp  (cost=2183.78..13500.68 rows=91272 width=20)
               Recheck Cond: (salary < '1100'::double precision)
               ->  Bitmap Index Scan on idx_emp_salary  (cost=0.00..2160.97 rows=91272 width=0)
                     Index Cond: (salary < '1100'::double precision)

TEST=# explain select * from dept where dept_id in (select dept_id from emp where salary<1100);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Hash Join  (cost=13731.14..13734.51 rows=100 width=29)
   Hash Cond: (dept.dept_id = emp.dept_id)
   ->  Seq Scan on dept  (cost=0.00..2.00 rows=100 width=29)
   ->  Hash  (cost=13729.87..13729.87 rows=101 width=5)
         ->  HashAggregate  (cost=13728.86..13729.87 rows=101 width=5)
               Group Key: emp.dept_id
               ->  Bitmap Heap Scan on emp  (cost=2183.78..13500.68 rows=91272 width=5)
                     Recheck Cond: (salary < '1100'::double precision)
                     ->  Bitmap Index Scan on idx_emp_salary  (cost=0.00..2160.97 rows=91272 width=0)
                           Index Cond: (salary < '1100'::double precision)

-- SORT AGGREGATE
TEST=#  explain select sum(salary) from emp;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=16384.55..16384.56 rows=1 width=8)
   ->  Gather  (cost=16384.33..16384.54 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=15384.33..15384.34 rows=1 width=8)
               ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=8)

TEST=# explain select min(salary),max(salary) from emp;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.01..1.02 rows=1 width=16)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.50 rows=1 width=8)
           ->  Index Only Scan using idx_emp_salary on emp  (cost=0.42..71128.49 rows=904233 width=8)
                 Index Cond: (salary IS NOT NULL)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.42..0.50 rows=1 width=8)
           ->  Index Only Scan Backward using idx_emp_salary on emp emp_1  (cost=0.42..71128.49 rows=904233 width=8)
                 Index Cond: (salary IS NOT NULL)

-- SORT GROUP BY
TEST=# explain select dept_id,count(*) from emp group by dept_id;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=17430.40..17455.99 rows=101 width=13)
   Group Key: dept_id
   ->  Gather Merge  (cost=17430.40..17453.97 rows=202 width=13)
         Workers Planned: 2
         ->  Sort  (cost=16430.37..16430.63 rows=101 width=13)
               Sort Key: dept_id
               ->  Partial HashAggregate  (cost=16426.00..16427.01 rows=101 width=13)
                     Group Key: dept_id
                     ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=5)

-- SORT(ORDER BY)
TEST=# explain select * from emp order by dept_id;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Gather Merge  (cost=67056.00..164285.09 rows=833334 width=42)
   Workers Planned: 2
   ->  Sort  (cost=66055.98..67097.65 rows=416667 width=42)
         Sort Key: dept_id
         ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=42)

TEST=# explain select * from (select * from emp order by salary desc) a limit 10;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1.24 rows=10 width=42)
   ->  Index Scan Backward using idx_emp_salary on emp  (cost=0.42..71848.41 rows=1000000 width=42)

-- SORT JOIN
TEST=# explain select e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=3.25..22914.40 rows=990099 width=28)
   Hash Cond: (e.dept_id = d.dept_id)
   ->  Seq Scan on emp e  (cost=0.00..20176.00 rows=1000000 width=20)
   ->  Hash  (cost=2.00..2.00 rows=100 width=18)
         ->  Seq Scan on dept d  (cost=0.00..2.00 rows=100 width=18)

❖ YashanDB

-- SORT UNIQUE
SQL> explain select distinct dept_id,emp_name from emp where salary<1100;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  HASH DISTINCT                 |                      |            |       920|        9( 0)|
|  2 |   TABLE ACCESS BY INDEX ROWID  | EMP                  | TESTUSER   |       920|        7( 0)|
|* 3 |    INDEX RANGE SCAN            | IDX_EMP_SALARY       | TESTUSER   |       920|        3( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

SQL> explain select * from dept where dept_id in (select dept_id from emp where salary<1100);
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  NESTED INDEX LOOPS INNER      |                      |            |        50|        8( 0)|
|  2 |   SORT DISTINCT                |                      |            |        50|        8( 0)|
|  3 |    TABLE ACCESS BY INDEX ROWID | EMP                  | TESTUSER   |       920|        7( 0)|
|* 4 |     INDEX RANGE SCAN           | IDX_EMP_SALARY       | TESTUSER   |       920|        3( 0)|
|  5 |   TABLE ACCESS BY INDEX ROWID  | DEPT                 | TESTUSER   |         1|        1( 0)|
|* 6 |    INDEX UNIQUE SCAN           | DEPT_PK              | TESTUSER   |         1|        1( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- SORT AGGREGATE
SQL>  explain select sum(salary) from emp;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  AGGREGATE                     |                      |            |         1|       27( 0)|
|  2 |   INDEX FAST FULL SCAN         | IDX_EMP_SALARY       | TESTUSER   |     10000|       26( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

SQL> explain select min(salary),max(salary) from emp;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  AGGREGATE                     |                      |            |         1|       27( 0)|
|  2 |   INDEX FAST FULL SCAN         | IDX_EMP_SALARY       | TESTUSER   |     10000|       26( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- SORT GROUP BY
SQL> explain select dept_id,count(*) from emp group by dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  HASH GROUP                    |                      |            |       100|       48( 0)|
|  2 |   TABLE ACCESS FULL            | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- SORT(ORDER BY)
SQL> explain select * from emp order by dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  SORT                          |                      |            |     10000|      224( 0)|
|  2 |   TABLE ACCESS FULL            | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

SQL> explain select * from (select * from emp order by salary desc) a limit 10;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  WINDOW                        |                      |            |        10|       47( 0)|
|  2 |   VIEW                         |                      |            |     10000|       47( 0)|
|  3 |    ORDER BY STOPKEY            |                      |            |     10000|       47( 0)|
|  4 |     TABLE ACCESS FULL          | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- SORT JOIN
SQL> explain select e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  NESTED INDEX LOOPS INNER      |                      |            |     10000|       48( 0)|
|  2 |   TABLE ACCESS FULL            | EMP                  | TESTUSER   |     10000|       46( 0)|
|  3 |   TABLE ACCESS BY INDEX ROWID  | DEPT                 | TESTUSER   |         1|        1( 0)|
|* 4 |    INDEX UNIQUE SCAN           | DEPT_PK              | TESTUSER   |         1|        1( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

SQL> explain select *+ use_merge(e d)*/ e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|* 1 |  MERGE JOIN INNER              |                      |            |     10000|     2295( 0)|
|  2 |   MERGE SORT                   |                      |            |          |             |
|  3 |    TABLE ACCESS FULL           | EMP                  | TESTUSER   |     10000|       46( 0)|
|  4 |   MERGE SORT                   |                      |            |          |             |
|  5 |    TABLE ACCESS FULL           | DEPT                 | TESTUSER   |       100|        1( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

❖ Vastbase

-- SORT UNIQUE
vastbase=> explain select distinct dept_id,emp_name from emp where salary<1100;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 HashAggregate  (cost=156.97..165.97 rows=900 width=20)
   Group By Key: dept_id, emp_name
   ->  Bitmap Heap Scan on emp  (cost=23.23..152.47 rows=900 width=20)
         Recheck Cond: (salary < 1100::double precision)
         ->  Bitmap Index Scan on idx_emp_salary  (cost=0.00..23.00 rows=900 width=0)
               Index Cond: (salary < 1100::double precision)

vastbase=> explain select * from dept where dept_id in (select dept_id from emp where salary<1100);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Hash Join  (cost=154.95..157.30 rows=100 width=30)
   Hash Cond: (dept.dept_id = emp.dept_id)
   ->  Seq Scan on dept  (cost=0.00..2.00 rows=100 width=30)
   ->  Hash  (cost=154.82..154.82 rows=10 width=8)
         ->  HashAggregate  (cost=154.72..154.82 rows=10 width=8)
               Group By Key: emp.dept_id
               ->  Bitmap Heap Scan on emp  (cost=23.23..152.47 rows=900 width=8)
                     Recheck Cond: (salary < 1100::double precision)
                     ->  Bitmap Index Scan on idx_emp_salary  (cost=0.00..23.00 rows=900 width=0)
                           Index Cond: (salary < 1100::double precision)

-- SORT AGGREGATE
vastbase=> explain select sum(salary) from emp;
                          QUERY PLAN
---------------------------------------------------------------
 Aggregate  (cost=243.00..243.01 rows=1 width=16)
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=8)

vastbase=> explain select min(salary),max(salary) from emp;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Result  (cost=0.17..0.18 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..0.09 rows=1 width=8)
           ->  Index Only Scan using idx_emp_salary on emp  (cost=0.00..785.75 rows=9000 width=8)
                 Index Cond: (salary IS NOT NULL)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.00..0.09 rows=1 width=8)
           ->  Index Only Scan Backward using idx_emp_salary on emp  (cost=0.00..785.75 rows=9000 width=8)
                 Index Cond: (salary IS NOT NULL)

-- SORT GROUP BY
vastbase=> explain select dept_id,count(*) from emp group by dept_id;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=268.00..269.01 rows=101 width=16)
   Group By Key: dept_id
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=8)

-- SORT(ORDER BY)
vastbase=> explain select * from emp order by dept_id;
                           QUERY PLAN
----------------------------------------------------------------
 Sort  (cost=882.39..907.39 rows=10000 width=44)
   Sort Key: dept_id
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=44)

vastbase=> explain select * from (select * from emp order by salary desc) a limit 10;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.89 rows=10 width=44)
   ->  Index Scan Backward using idx_emp_salary on emp  (cost=0.00..794.25 rows=10000 width=44)

-- SORT JOIN
vastbase=> explain select e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
                             QUERY PLAN
---------------------------------------------------------------------
 Hash Join  (cost=3.25..351.51 rows=9901 width=24)
   Hash Cond: (e.dept_id = d.dept_id)
   ->  Seq Scan on emp e  (cost=0.00..218.00 rows=10000 width=20)
   ->  Hash  (cost=2.00..2.00 rows=100 width=20)
         ->  Seq Scan on dept d  (cost=0.00..2.00 rows=100 width=20)

vastbase=> explain select *+ use_merge(e d)*/  e.emp_name,d.dept_name from emp e,dept d where e.dept_id=d.dept_id;
                               QUERY PLAN
------------------------------------------------------------------------
 Merge Join  (cost=887.71..1036.99 rows=9901 width=24)
   Merge Cond: (d.dept_id = e.dept_id)
   ->  Sort  (cost=5.32..5.57 rows=100 width=20)
         Sort Key: d.dept_id
         ->  Seq Scan on dept d  (cost=0.00..2.00 rows=100 width=20)
   ->  Sort  (cost=882.39..907.39 rows=10000 width=20)
         Sort Key: e.dept_id
         ->  Seq Scan on emp e  (cost=0.00..218.00 rows=10000 width=20)

2. 分组聚合

数据库中的分组聚合是两类操作：分组操作是指用SQL语句将一个结果集分为若干组，并对这样每一组进行聚合计算；聚合操作则是基于多行记录返回数据数据:平均、最大、最小值等，聚合操作必须处理输入数据的每一行记录，因此通常和全表扫描联系在一起。

1).常见分组聚合操作

针对常见的分组聚合类操作，抽象出一组测例，看看Oracle和国产库的行为如何。

❖ 聚合：一般聚合

对整个结果集进行计算，一般都是走的全表扫描，如果有索引则会走索引快速全扫描。

❖ 聚合：极值

针对结果集的最大、最小值等计算，如果是索引列，可采用一些更优的做法，因为后者是有序的。

❖ 聚合：计数

对整个结果集进行计数，一般走全表扫描，如果有不可为空的列索引，优化器也是可以采用的。

❖ 分组：一般分组

一般分组下，可采用排序分组的方式，也可采用更为推荐的哈希分组，这样代价更小。

❖ 分组：分组+排序

如果针对分组后的结果还需要排序操作，上面说的哈希分组就不太合适。如果通过 SORT GROUP BY能解决分组问题的同时，还能提供有序的结果集输出，无疑效率是要比 HASH GROUP BY 更高的。

❖ 分组：分组过滤

针对分组数据进行过滤，可以有两种方式 WHERE 或 HAVING。如果记录可以通过WHERE来排除，应该在聚合发生之前就已经被排除。相比之下，HAVING在聚合完成之后对记录进行排除。参与聚合的记录越少，效果就越好，所以一般情况下WHERE子句在这方面比HAVING子句更可取。

2).Oracle 测试示例

-- 聚合：一般聚合
SQL> explain plan for select sum(salary) from emp;
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |     1 |     5 |    10   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |                |     1 |     5 |            |          |
|   2 |   INDEX FAST FULL SCAN| IDX_EMP_SALARY | 10000 | 50000 |    10   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

-- 聚合：极值
SQL> explain plan for select max(salary),min(salary) from emp;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     5 |    15   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 | 50000 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

-- 聚合：计数
SQL> explain plan for select count(*) from emp;
SQL> select * from table(dbms_xplan.display);
------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |        |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| EMP_PK | 10000 |     7   (0)| 00:00:01 |
------------------------------------------------------------------------

-- 分组：一般分组
SQL> explain plan for select dept_id,avg(salary) from emp group by dept_id;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |   800 |    16   (7)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   100 |   800 |    16   (7)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 | 80000 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

-- 分组：分组+排序
SQL> explain plan for select dept_id,avg(salary) from emp group by dept_id order by dept_id;
SQL> select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |   800 |    16   (7)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |   100 |   800 |    16   (7)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  | 10000 | 80000 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

-- 分组：分组过滤（WHERE/HAVING）
SQL> explain plan for select dept_id,count(*) from emp where salary<1500 group by dept_id;
SQL>  select * from table(dbms_xplan.display);
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |   800 |    16   (7)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   100 |   800 |    16   (7)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| EMP  |  4499 | 35992 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------

SQL> explain plan for select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     5 |    15 |    16   (7)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |     5 |    15 |    16   (7)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  | 10000 | 30000 |    15   (0)| 00:00:01 |
----------------------------------------------------------------------------

3).国产库测试示例

下面是针对上述测例，国产库的行为如何？先来看看整体结果

❖ MySQL

-- 聚合：一般聚合
mysql> explain select sum(salary) from emp;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | emp   | NULL       | index | NULL          | idx_emp_salary | 5       | NULL | 10117 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+-------------+

-- 聚合：极值
mysql> explain select max(salary),min(salary) from emp;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

-- 聚合：计数
mysql> explain select count(*) from emp;
+----+-------------+-------+------------+-------+---------------+------------------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key              | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+------------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | emp   | NULL       | index | NULL          | idx_emp_birthday | 4       | NULL | 10117 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+------------------+---------+------+-------+----------+-------------+

-- 分组：一般分组
mysql>  explain select dept_id,avg(salary) from emp group by dept_id;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10117 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+

-- 分组：分组+排序
mysql> explain select dept_id,avg(salary) from emp group by dept_id order by dept_id;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                           |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10117 |   100.00 | Using temporary; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+---------------------------------+

-- 分组：分组过滤（WHERE/HAVING）
mysql> explain select dept_id,count(*) from emp where salary<1500 group by dept_id;
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys  | key  | key_len | ref  | rows  | filtered | Extra                        |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+------------------------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | idx_emp_salary | NULL | NULL    | NULL | 10117 |    39.28 | Using where; Using temporary |
+----+-------------+-------+------------+------+----------------+------+---------+------+-------+----------+------------------------------+

mysql> explain select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10117 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-----------------+

❖ DM

-- 聚合：一般聚合
SQL> explain select sum(salary) from emp;
1   #NSET2: [1, 1, 30]
2     #PRJT2: [1, 1, 30]; exp_num(1), is_atom(FALSE)
3       #AAGR2: [1, 1, 30]; grp_num(0), sfun_num(1), distinct_flag[0]; slave_empty(0)
4         #SSCN: [1, 10000, 30]; IDX_EMP_SALARY(EMP); btr_scan(1); is_global(0)

-- 聚合：极值
SQL> explain select max(salary),min(salary) from emp;
1   #NSET2: [1, 1, 30]
2     #PRJT2: [1, 1, 30]; exp_num(2), is_atom(FALSE)
3       #FAGR2: [1, 1, 30]; sfun_num(2), IDX_EMP_SALARY

-- 聚合：计数
SQL> explain select count(*) from emp;
1   #NSET2: [1, 1, 0]
2     #PRJT2: [1, 1, 0]; exp_num(1), is_atom(FALSE)
3       #FAGR2: [1, 1, 0]; sfun_num(1)

-- 分组：一般分组
SQL> explain select dept_id,avg(salary) from emp group by dept_id;
1   #NSET2: [2, 100, 60]
2     #PRJT2: [2, 100, 60]; exp_num(2), is_atom(FALSE)
3       #HAGR2: [2, 100, 60]; grp_num(1), sfun_num(1), distinct_flag[0]; slave_empty(0) keys(EMP.DEPT_ID)
4         #CSCN2: [1, 10000, 60]; INDEX33555484(EMP); btr_scan(1)

-- 分组：分组+排序
SQL> explain select dept_id,avg(salary) from emp group by dept_id order by dept_id;
1   #NSET2: [3, 100, 60]
2     #PRJT2: [3, 100, 60]; exp_num(2), is_atom(FALSE)
3       #SORT3: [3, 100, 60]; key_num(1), partition_key_num(0), is_distinct(FALSE), top_flag(0), is_adaptive(0)
4         #HAGR2: [2, 100, 60]; grp_num(1), sfun_num(1), distinct_flag[0]; slave_empty(0) keys(EMP.DEPT_ID)
5           #CSCN2: [1, 10000, 60]; INDEX33555484(EMP); btr_scan(1)

-- 分组：分组过滤（WHERE/HAVING）
SQL> explain select dept_id,count(*) from emp where salary<1500 group by dept_id;
1   #NSET2: [2, 45, 60]
2     #PRJT2: [2, 45, 60]; exp_num(2), is_atom(FALSE)
3       #HAGR2: [2, 45, 60]; grp_num(1), sfun_num(1), distinct_flag[0]; slave_empty(0) keys(EMP.DEPT_ID)
4         #SLCT2: [1, 4543, 60]; EMP.SALARY < var1
5           #CSCN2: [1, 10000, 60]; INDEX33555484(EMP); btr_scan(1)

SQL> explain select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
1   #NSET2: [2, 5, 30]
2     #PRJT2: [2, 5, 30]; exp_num(2), is_atom(FALSE)
3       #SLCT2: [2, 5, 30]; exp_sfun9 > var1
4         #HAGR2: [2, 100, 30]; grp_num(1), sfun_num(1), distinct_flag[0]; slave_empty(0) keys(EMP.DEPT_ID)
5           #CSCN2: [1, 10000, 30]; INDEX33555484(EMP); btr_scan(1)

❖ KingBase

-- 聚合：一般聚合
TEST=# explain select sum(salary) from emp;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=16384.55..16384.56 rows=1 width=8)
   ->  Gather  (cost=16384.33..16384.54 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=15384.33..15384.34 rows=1 width=8)
               ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=8)

-- 聚合：极值
TEST=# explain select max(salary),min(salary) from emp;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Result  (cost=1.01..1.02 rows=1 width=16)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.50 rows=1 width=8)
           ->  Index Only Scan Backward using idx_emp_salary on emp  (cost=0.42..71128.49 rows=904233 width=8)
                 Index Cond: (salary IS NOT NULL)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.42..0.50 rows=1 width=8)
           ->  Index Only Scan using idx_emp_salary on emp emp_1  (cost=0.42..71128.49 rows=904233 width=8)
                 Index Cond: (salary IS NOT NULL)

-- 聚合：计数
TEST=# explain select count(*) from emp;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=16384.55..16384.56 rows=1 width=8)
   ->  Gather  (cost=16384.33..16384.54 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=15384.33..15384.34 rows=1 width=8)
               ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=0)

-- 分组：一般分组
TEST=# explain select dept_id,avg(salary) from emp group by dept_id;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=17430.40..17456.24 rows=101 width=13)
   Group Key: dept_id
   ->  Gather Merge  (cost=17430.40..17453.97 rows=202 width=37)
         Workers Planned: 2
         ->  Sort  (cost=16430.37..16430.63 rows=101 width=37)
               Sort Key: dept_id
               ->  Partial HashAggregate  (cost=16426.00..16427.01 rows=101 width=37)
                     Group Key: dept_id
                     ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=13)

-- 分组：分组+排序
TEST=# explain select dept_id,avg(salary) from emp group by dept_id order by dept_id;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=17430.40..17456.24 rows=101 width=13)
   Group Key: dept_id
   ->  Gather Merge  (cost=17430.40..17453.97 rows=202 width=37)
         Workers Planned: 2
         ->  Sort  (cost=16430.37..16430.63 rows=101 width=37)
               Sort Key: dept_id
               ->  Partial HashAggregate  (cost=16426.00..16427.01 rows=101 width=37)
                     Group Key: dept_id
                     ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=13)

-- 分组：分组过滤（WHERE/HAVING）
TEST=# explain select dept_id,count(*) from emp where salary<1500 group by dept_id;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=17332.58..17358.17 rows=101 width=13)
   Group Key: dept_id
   ->  Gather Merge  (cost=17332.58..17356.15 rows=202 width=13)
         Workers Planned: 2
         ->  Sort  (cost=16332.56..16332.81 rows=101 width=13)
               Sort Key: dept_id
               ->  Partial HashAggregate  (cost=16328.18..16329.19 rows=101 width=13)
                     Group Key: dept_id
                     ->  Parallel Seq Scan on emp  (cost=0.00..15384.33 rows=188770 width=5)
                           Filter: (salary < '1500'::double precision)

TEST=# explain select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=17430.40..17456.74 rows=34 width=13)
   Group Key: dept_id
   Filter: (count(*) > 2)
   ->  Gather Merge  (cost=17430.40..17453.97 rows=202 width=13)
         Workers Planned: 2
         ->  Sort  (cost=16430.37..16430.63 rows=101 width=13)
               Sort Key: dept_id
               ->  Partial HashAggregate  (cost=16426.00..16427.01 rows=101 width=13)
                     Group Key: dept_id
                     ->  Parallel Seq Scan on emp  (cost=0.00..14342.67 rows=416667 width=5)

❖ YashanDB

-- 聚合：一般聚合
SQL> explain select sum(salary) from emp;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  AGGREGATE                     |                      |            |         1|       27( 0)|
|  2 |   INDEX FAST FULL SCAN         | IDX_EMP_SALARY       | TESTUSER   |     10000|       26( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- 聚合：极值
SQL> explain select max(salary),min(salary) from emp;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  AGGREGATE                     |                      |            |         1|       27( 0)|
|  2 |   INDEX FAST FULL SCAN         | IDX_EMP_SALARY       | TESTUSER   |     10000|       26( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- 聚合：计数
SQL> explain select count(*) from emp;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  AGGREGATE                     |                      |            |         1|       14( 0)|
|  2 |   INDEX FAST FULL SCAN         | EMP_PK               | TESTUSER   |     10000|       14( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- 分组：一般分组
SQL> explain select dept_id,avg(salary) from emp group by dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  HASH GROUP                    |                      |            |       100|       48( 0)|
|  2 |   TABLE ACCESS FULL            | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- 分组：分组+排序
SQL> explain select dept_id,avg(salary) from emp group by dept_id order by dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  SORT                          |                      |            |       100|       49( 0)|
|  2 |   HASH GROUP                   |                      |            |       100|       48( 0)|
|  3 |    TABLE ACCESS FULL           | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

-- 分组：分组过滤（WHERE/HAVING）
SQL> explain select dept_id,count(*) from emp where salary<1500 group by dept_id;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|  1 |  HASH GROUP                    |                      |            |        50|       37( 0)|
|  2 |   TABLE ACCESS BY INDEX ROWID  | EMP                  | TESTUSER   |      4509|       36( 0)|
|* 3 |    INDEX RANGE SCAN            | IDX_EMP_SALARY       | TESTUSER   |      4509|       12( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

SQL> explain select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
+----+--------------------------------+----------------------+------------+----------+-------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  |
+----+--------------------------------+----------------------+------------+----------+-------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |
|* 1 |  HASH GROUP                    |                      |            |        33|       48( 0)|
|  2 |   TABLE ACCESS FULL            | EMP                  | TESTUSER   |     10000|       46( 0)|
+----+--------------------------------+----------------------+------------+----------+-------------+

❖ Vastbase

-- 聚合：一般聚合
vastbase=> explain select sum(salary) from emp;
                          QUERY PLAN
---------------------------------------------------------------
 Aggregate  (cost=243.00..243.01 rows=1 width=16)
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=8)

-- 聚合：极值
vastbase=> explain select max(salary),min(salary) from emp;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Result  (cost=0.17..0.18 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..0.09 rows=1 width=8)
           ->  Index Only Scan Backward using idx_emp_salary on emp  (cost=0.00..785.75 rows=9000 width=8)
                 Index Cond: (salary IS NOT NULL)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.00..0.09 rows=1 width=8)
           ->  Index Only Scan using idx_emp_salary on emp  (cost=0.00..785.75 rows=9000 width=8)
                 Index Cond: (salary IS NOT NULL)

-- 聚合：计数
vastbase=> explain select count(*) from emp;
                          QUERY PLAN
---------------------------------------------------------------
 Aggregate  (cost=243.00..243.01 rows=1 width=8)
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=0)

-- 分组：一般分组
vastbase=> explain select dept_id,avg(salary) from emp group by dept_id;
                           QUERY PLAN
----------------------------------------------------------------
 HashAggregate  (cost=268.00..269.26 rows=101 width=48)
   Group By Key: dept_id
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=16)

-- 分组：分组+排序
vastbase=> explain select dept_id,avg(salary) from emp group by dept_id order by dept_id;
                              QUERY PLAN
----------------------------------------------------------------------
 Sort  (cost=272.62..272.88 rows=101 width=48)
   Sort Key: dept_id
   ->  HashAggregate  (cost=268.00..269.26 rows=101 width=48)
         Group By Key: dept_id
         ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=16)


-- 分组：分组过滤（WHERE/HAVING）
vastbase=> explain select dept_id,count(*) from emp where salary<1500 group by dept_id;
                          QUERY PLAN
--------------------------------------------------------------
 HashAggregate  (cost=265.67..266.12 rows=46 width=16)
   Group By Key: dept_id
   ->  Seq Scan on emp  (cost=0.00..243.00 rows=4533 width=8)
         Filter: (salary < 1500::double precision)

vastbase=> explain select dept_id,count(*) cnt from emp group by dept_id having count(*) >2;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=293.00..294.26 rows=101 width=24)
   Group By Key: dept_id
   Filter: (count(*) > 2)
   ->  Seq Scan on emp  (cost=0.00..218.00 rows=10000 width=8)

文章转载自韩锋频道，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

国产集中库SQL能力评测 - 排序和分组聚合

1).常见排序操作

2).Oracle 测试示例

3).国产库测试示例

下面是针对上述测例，国产库的行为如何？先来看看整体结果

❖ MySQL

❖ DM

❖ KingBase

❖ YashanDB

❖ Vastbase

评论