对于分页查询,随着访问的记录越来越多,查询会越来越慢,典型的如:单表的根据索引order by + limit 场景、两表采用nestloop + limit 场景。因此,分页查询变慢可能是正常的事。但是,有一类的分页查询,从逻辑上是可以优化的。具体例子如下:
PS:考虑到用户的隐私,且为了能让读者易理解,这里构造了简单的例子。
一、构造测试数据
create table t1(t1id1 integer, t1name text);
create table t2(t2id1 integer primary key, t2name text);
insert into t1 select generate_series(1,1000000),md5(random());
insert into t2 select generate_series(1,1000000),md5(random());
create index idx_t2 on t2(t2id1);二、原始SQL及执行计划
原始SQL:
explain analyze select t1id1,(select t2name from t2 where t2id1=t1id1) from t1 order by t1id1 limit 500000,200;执行计划如下:
test=# explain analyze select t1id1,(select t2name from t2 where t2id1=t1id1) from t1 order by t1id1 limit 500000,200;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4359236.34..4360927.34 rows=200 width=36) (actual time=1208.772..1209.091 rows=200 loops=1)
-> Result (cost=131736.34..8586736.34 rows=1000000 width=36) (actual time=215.330..1189.499 rows=500200 loops=1)
-> Sort (cost=131736.34..134236.34 rows=1000000 width=4) (actual time=215.283..271.665 rows=500200 loops=1)
Sort Key: t1.t1id1
Sort Method: external merge Disk: 13800kB
-> Seq Scan on t1 (cost=0.00..18404.00 rows=1000000 width=4) (actual time=0.007..95.238 rows=1000000 loops=1)
SubPlan 1
-> Index Scan using idx_t2 on t2 (cost=0.42..8.44 rows=1 width=33) (actual time=0.001..0.001 rows=1 loops=500200)
Index Cond: (t2id1 = t1.t1id1)
Planning Time: 0.199 ms
Execution Time: 1210.359 ms问题分析:从执行计划可以看出,SubPlan loop 次数达到了500200次,而不是预想中的固定200次。这就解释了为什么随着翻页的进行,返回结果的速度会越来越慢。
三、修改后的SQL及执行计划
修改后的SQL:减少 t2 表的调用。
explain analyze select t1id1, (select t2name from t2 where t2id1=t1id1) from (select t1id1 from t1 order by t1id1 limit 500000,200) t1 order by t1id1执行计划如下:
test=# explain analyze select t1id1, (select t2name from t2 where t2id1=t1id1) from (select t1id1 from t1 order by t1id1 limit 500000,200) t1 order by t1id1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t1 (cost=132986.34..134677.34 rows=200 width=36) (actual time=379.957..380.925 rows=200 loops=1)
-> Limit (cost=132986.34..132986.84 rows=200 width=4) (actual time=379.891..379.956 rows=200 loops=1)
-> Sort (cost=131736.34..134236.34 rows=1000000 width=4) (actual time=319.038..365.162 rows=500200 loops=1)
Sort Key: t1_1.t1id1
Sort Method: external merge Disk: 13800kB
-> Seq Scan on t1 t1_1 (cost=0.00..18404.00 rows=1000000 width=4) (actual time=0.012..143.405 rows=1000000 loops=1)
SubPlan 1
-> Index Scan using idx_t2 on t2 (cost=0.42..8.44 rows=1 width=33) (actual time=0.003..0.003 rows=1 loops=200)
Index Cond: (t2id1 = t1.t1id1)
Planning Time: 0.102 ms
Execution Time: 383.417 ms修改后的SQL如预期的计划执行。
四、Oracle的执行情况
为了验证Oracle 的执行效率,在oracle环境也构造了同样的数据及SQL,发现Oracle 也存在同样的问题。
create table t1(t1id1 integer, t1name varchar(999));
create table t2(t2id1 integer primary key, t2name varchar(999));
insert into t1 select object_id,object_name from dba_objects;
insert into t2 select object_id,object_name from dba_objects where object_id is not null;
select/*+gather_plan_statistics*/ t1id1,(select count(*) as t2cnt from t2 where t2id1=t1id1)
from t1 order by t1id1 offset 50000 rows fetch next 200 rows only;
可以看到,同样子查询执行次数超过offset 次。
五、优化器为什么无法自动进行逻辑优化改写?
我们来比较两条语句:语句的逻辑并不完全等价。
语句1:如果在t1表前面的50万条,比如第1000条,取回的t1.t1id1 对应 t2.t2id1 在t2 表里有两条记录,那以下的语句就会报错。
explain analyze select t1id1,(select t2name from t2 where t2id1=t1id1) from t1 order by t1id1 limit 500000,200;语句2:因为在访问t2 表之前,就提前跳过50万行,因此,同样的数据,该语句不会报错。
explain analyze select t1id1, (select t2name from t2 where t2id1=t1id1) from (select t1id1 from t1 order by t1id1 limit 500000,200) t1 order by t1id1最后修改时间:2024-08-16 17:01:37
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




