GaussDB T 分布式SQL执行策略

原创我的世界 2020-03-04

1549

分布式中，根据路由规则，分布式SQL语句的执行计划主要有两个策略:

下推整个SQL语句到DN执行:指直接将整个SQL语句从CN发送到DN进行执行，然后将执行结果返回给CN。
下推部分SQL语句到DN执行:指在无法将整个CN下推到DN执行时，CN将SQL语句拆解成多个部分，将可下推的查询部分组成查询语句(多为基表扫描语句)下推到DN进行执行，获取中间结果到CN，然后在CN执行剩下的部分。

从上面的两个策略可以看出，策略1会将整个SQL语句直接下推到DN进行执行，该策略是比较友好的执行方式，效率高;而策略2要将大量中间结果从DN发送给CN，并且要在CN上允许不能下推的部分语句，会导致CN成为性能瓶颈(带宽、存储、计算等)，在进行数据库设计与性能调优的时候，应尽量避免出现选择策略2的语句。
执行语句不能下推是因为语句使用的路由规则不合适导致的，例如下面列出了几种可以实现直接下推整个SQL语句的场景，具体如下:

WHERE条件中包含分片字段，可以直接下推到具体某一个DN执行，例如:

create table t1(s_id int not null, s_name varchar(100) not null) distribute by hash(s_id); select * from t1 where s_id = 1;

JOIN:表关联时，关联条件中需要包含分片字段，例如:

create table t1(s_id int not null, s_name varchar(100) not null) distribute by hash(s_id);

GROUP BY:在分组查询时，分组字段中需要包含分片字段，例如:

create table t2(c_id int not null, s_id int not null, c_score double not null) distribute by hash(s_id); select a.s_name,b.c_score from t1 a join t2 b on a.s_id=b.s_id;
create table t2(c_id int not null, s_id int not null, c_score double not null) distribute by hash(s_id); select s_id,avg(c_score) from t2 group by s_id;

ORDER BY:在排序查询时，排序字段中需要包含分片字段，例如:

create table t2(c_id int not null, s_id int not null, c_score double not null) distribute by hash(s_id); select * from t2 order by s_id;

gaussdb t

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

GaussDB T 分布式SQL执行策略

评论