MogDB 排序算子优化

原创 MogDB 2024-08-05

277

可获得性

本特性自MogDB 3.1.0 版本开始引入。

特性简介

该特性通过多种手段提升了排序算子的性能。首先，针对单个列的排序进行了优化，采用了更高效的算法和数据结构，减少了排序所需的计算和存储资源。其次，引入了专门的快速排序函数，这是一种常用且高效的排序算法，能够在平均情况下达到较高的排序性能。此外，还支持增量排序，可以利用已经排好序的部分数据，在新数据到达时快速将其插入到已排序的数据集中。

客户价值

相比传统的比较函数调用方式，快速排序函数能够减少开销，从而提高性能。通常情况下，使用快速排序函数可以获得2%~5%的性能提升。

增量排序法利用索引的有序排列，对其他字段进行增量排序，可以减少排序字段的数量，从而提高性能。根据具体情况不同，这种方法可以带来10~100倍的性能提升。

这两种优化方法都可以在查询和排序操作中使用，以提高性能并减少开销。具体的性能提升幅度会受到数据量、索引设计、查询条件等因素的影响。但总体而言，这些优化方法都是为了提高查询和排序操作的性能，让您的业务能够更加高效地进行数据处理。

特性描述

该特性对单个列的排序进行了优化，通过仅保存一个Datum结构，避免了将元组复制到排序内存中的操作，从而减少了开销。

在MogDB中使用快速排序进行排序时，每种数据类型都有自己的比较函数。为了避免多次调用比较函数的开销，该特性引入了一组新的快排函数。这些函数的比较函数是内联的，这样就可以消除大量的比较函数调用所需的开销，从而提高性能。

此外，该特性还引入了一种增量排序方法，利用索引的有序排列，在此基础上对其他字段进行增量排序，以减少排序字段的数量，从而提高性能。

这些优化措施都非常有助于提高排序操作的性能，减少了开销，让排序更加高效和快速。

特性约束

只支持如下类型的排序且不能是ORDER BY LIMIT的方式：整数，date，timestamp，uuid，text，varchar，char

示例

创建表，插入数据，开启增量排序功能。

 -- 创建表，插入测试数据
 drop table if exists MogDB_incresort_1;
 create table MogDB_incresort_1 (id int, pname name, match text);
 
 create index on MogDB_incresort_1(id);
 
 insert into MogDB_incresort_1
 values (
         generate_series(1, 20000),
         'player# ' || generate_series(1, 20000),
         'match# ' || generate_series(1, 11)
     );
 
 vacuum analyze MogDB_incresort_1;
 
 -- 打开增量排序功能
 set enable_incremental_sort = on;

增量排序算子一般会在部分有序且存在 LIMIT 的场景生成，比如 Index Scan + LIMIT 的场景。

 -- 利用索引扫描增量排序
 MogDB=# explain (costs off) select id, pname from MogDB_incresort_1 where id < 20 order by id, pname limit 20;
                                  QUERY PLAN
 ----------------------------------------------------------------------------
  Limit
    ->  Incremental Sort
          Sort Key: id, pname
          Presorted Key: id
          ->  Index Scan using mogdb_incresort_1_id_idx on mogdb_incresort_1
                Index Cond: (id < 20)
 (6 rows)

将步骤2的索引扫描替换成有序的子查询也可以进行增量排序。

 -- 利用有序的子查询增量排序
  MogDB=# explain (costs off)
 select players.pname,
     random() as lottery_number
 from (
         select distinct pname
         from MogDB_incresort_1
         group by pname
         order by pname
     ) as players
 order by players.pname,
     lottery_number
 limit 20;
                               QUERY PLAN
 -----------------------------------------------------------------------
  Limit
    ->  Incremental Sort
          Sort Key: players.pname, (random())
          Presorted Key: players.pname
          ->  Subquery Scan on players
                ->  Unique
                      ->  Sort
                            Sort Key: mogdb_incresort_1.pname
                            ->  HashAggregate
                                  Group By Key: mogdb_incresort_1.pname
                                  ->  Seq Scan on mogdb_incresort_1
 (11 rows)