暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL GiST Order by 距离 + 距离范围判定 + limit 骤变优化与背景原因

digoal 2019-12-18
1131

作者

digoal

日期

2019-12-18

标签

PostgreSQL , GiST , 距离排序 , 范围限定 , limit , 骤变 , 全扫描


背景

PostgreSQL GiST索引支持距离排序查询,有些查询会有三种要求:

1、 Order by 距离
2、 限定距离范围
3、 limit 返回记录数

```
create extension postgis;

create table t_pos(
id int primary key,
pos geometry
);

insert into t_pos
select * from (
select id,
ST_SetSRID(
ST_Point( round((random()(135.085831-73.406586)+73.406586)::numeric,6),
round((random()
(53.880950-3.408477)+3.408477)::numeric,6)
),
4326
) as pos
from generate_series(1,1000000000) t(id)
) t
order by st_geohash(pos,15);

create index idx_t_pos_1 on t_pos using gist(pos);

select *,
st_distancespheroid(pos, st_setsrid(st_makepoint(120,50),4326), 'SPHEROID["WGS84",6378137,298.257223563]') as dist
from t_pos
where
st_distancespheroid(pos, st_setsrid(st_makepoint(120,50),4326), 'SPHEROID["WGS84",6378137,298.257223563]') < 5000
order by pos <-> st_setsrid(st_makepoint(120,50),4326)
limit 100;
```

```
create table t_age(id int, age int);
insert into t_age select generate_series(1,10000000), random()*120;
create index idx_t_age_1 on t_age using gist (age);

select * from t_age
where
(age <-> 25) <1
order by age <-> 25
limit 100000;
```

以上两条sql,如果使用gist索引,当距离小于5000的记录不足100条,或者年龄差距1以内的记录小于10万条时。会怎么样?

答:

会把整个索引全扫一遍。

为什么呢?

因为数据库并不知道后面还有没有满足where条件的记录。

虽然我们人类都能理解,你既然时按距离由近到远顺序返回的,那么当某条记录已经无法满足where 条件(距离已经大于等于5000,年龄差已经大于等于1)的情况下,后面就不需要扫描了。

所以这是数据库内核可以优化的点。

优化方法

1、子查询优化,存在浪费,例如limit 1000,但是满足条件的只有1条,那么999条就是浪费的扫描。

select * from (
select * from t_age
order by age <-> 25
limit 1000
) t
where
(age <-> 25) <1 ;

对比如下

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_age where (age <-> 25) <1 order by age <-> 25 limit 100000;
QUERY PLAN


Limit (cost=0.36..3572.19 rows=100000 width=12) (actual time=0.169..10757.930 rows=83553 loops=1)
Output: id, age, ((age <-> 25))
Buffers: shared hit=9525191
-> Index Scan using idx_t_age_1 on public.t_age (cost=0.36..119061.20 rows=3333333 width=12) (actual time=0.167..10750.016 rows=83553 loops=1)
Output: id, age, (age <-> 25)
Order By: (t_age.age <-> 25)
Filter: ((t_age.age <-> 25) < 1)
Rows Removed by Filter: 9916447
Buffers: shared hit=9525191
Planning Time: 0.153 ms
Execution Time: 10762.824 ms
(11 rows)
```

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from (
select * from t_age
order by age <-> 25
limit 100000
) t
where
(age <-> 25) <1 ;
QUERY PLAN


Subquery Scan on t (cost=0.36..1354.35 rows=33333 width=8) (actual time=0.146..144.873 rows=83553 loops=1)
Output: t.id, t.age
Filter: ((t.age <-> 25) < 1)
Rows Removed by Filter: 16447
Buffers: shared hit=95549
-> Limit (cost=0.36..901.97 rows=100000 width=12) (actual time=0.144..133.330 rows=100000 loops=1)
Output: t_age.id, t_age.age, ((t_age.age <-> 25))
Buffers: shared hit=95549
-> Index Scan using idx_t_age_1 on public.t_age (cost=0.36..90161.39 rows=10000000 width=12) (actual time=0.143..124.430 rows=100000 loops=1)
Output: t_age.id, t_age.age, (t_age.age <-> 25)
Order By: (t_age.age <-> 25)
Buffers: shared hit=95549
Planning Time: 0.108 ms
Execution Time: 148.951 ms
(14 rows)
```

2、绝不浪费,自定义pl函数优化

create or replace function ff(int, int, int) returns setof t_age as $$ declare v t_age; i int := 0; begin set enable_seqscan=off; set enable_indexscan=on; set enable_bitmapscan=off; for v in select * from t_age order by age <-> $1 loop if (v.age <-> $1) >= $2 or (i) >= $3 then return; end if; i := i+1; return next v; end loop; return; end; $$ language plpgsql strict;

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff(25,1,1000000);
QUERY PLAN


Function Scan on public.ff (cost=0.22..0.41 rows=1000 width=8) (actual time=159.203..164.049 rows=83553 loops=1)
Output: id, age
Function Call: ff(25, 1, 1000000)
Buffers: shared hit=79848
Planning Time: 0.042 ms
Execution Time: 168.601 ms
(6 rows)
```

3、内核优化

排序扫描支持filter识别,避免未满足limit数量时需要扫描所有记录

参考

《HTAP数据库 PostgreSQL 场景与性能测试之 6 - (OLTP) 空间应用 - KNN查询(搜索附近对象,由近到远排序输出)》

《GIS附近查找性能优化 - PostGIS long lat geometry distance search tuning using gist knn function》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论