作者
digoal
日期
2019-12-18
标签
PostgreSQL , GiST , 距离排序 , 范围限定 , limit , 骤变 , 全扫描
背景
PostgreSQL GiST索引支持距离排序查询,有些查询会有三种要求:
1、 Order by 距离
2、 限定距离范围
3、 limit 返回记录数
```
create extension postgis;
create table t_pos(
id int primary key,
pos geometry
);
insert into t_pos
select * from (
select id,
ST_SetSRID(
ST_Point( round((random()(135.085831-73.406586)+73.406586)::numeric,6),
round((random()(53.880950-3.408477)+3.408477)::numeric,6)
),
4326
) as pos
from generate_series(1,1000000000) t(id)
) t
order by st_geohash(pos,15);
create index idx_t_pos_1 on t_pos using gist(pos);
select *,
st_distancespheroid(pos, st_setsrid(st_makepoint(120,50),4326), 'SPHEROID["WGS84",6378137,298.257223563]') as dist
from t_pos
where
st_distancespheroid(pos, st_setsrid(st_makepoint(120,50),4326), 'SPHEROID["WGS84",6378137,298.257223563]') < 5000
order by pos <-> st_setsrid(st_makepoint(120,50),4326)
limit 100;
```
或
```
create table t_age(id int, age int);
insert into t_age select generate_series(1,10000000), random()*120;
create index idx_t_age_1 on t_age using gist (age);
select * from t_age
where
(age <-> 25) <1
order by age <-> 25
limit 100000;
```
以上两条sql,如果使用gist索引,当距离小于5000的记录不足100条,或者年龄差距1以内的记录小于10万条时。会怎么样?
答:
会把整个索引全扫一遍。
为什么呢?
因为数据库并不知道后面还有没有满足where条件的记录。
虽然我们人类都能理解,你既然时按距离由近到远顺序返回的,那么当某条记录已经无法满足where 条件(距离已经大于等于5000,年龄差已经大于等于1)的情况下,后面就不需要扫描了。
所以这是数据库内核可以优化的点。
优化方法
1、子查询优化,存在浪费,例如limit 1000,但是满足条件的只有1条,那么999条就是浪费的扫描。
select * from (
select * from t_age
order by age <-> 25
limit 1000
) t
where
(age <-> 25) <1 ;
对比如下
```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_age where (age <-> 25) <1 order by age <-> 25 limit 100000;
QUERY PLAN
Limit (cost=0.36..3572.19 rows=100000 width=12) (actual time=0.169..10757.930 rows=83553 loops=1)
Output: id, age, ((age <-> 25))
Buffers: shared hit=9525191
-> Index Scan using idx_t_age_1 on public.t_age (cost=0.36..119061.20 rows=3333333 width=12) (actual time=0.167..10750.016 rows=83553 loops=1)
Output: id, age, (age <-> 25)
Order By: (t_age.age <-> 25)
Filter: ((t_age.age <-> 25) < 1)
Rows Removed by Filter: 9916447
Buffers: shared hit=9525191
Planning Time: 0.153 ms
Execution Time: 10762.824 ms
(11 rows)
```
```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from (
select * from t_age
order by age <-> 25
limit 100000
) t
where
(age <-> 25) <1 ;
QUERY PLAN
Subquery Scan on t (cost=0.36..1354.35 rows=33333 width=8) (actual time=0.146..144.873 rows=83553 loops=1)
Output: t.id, t.age
Filter: ((t.age <-> 25) < 1)
Rows Removed by Filter: 16447
Buffers: shared hit=95549
-> Limit (cost=0.36..901.97 rows=100000 width=12) (actual time=0.144..133.330 rows=100000 loops=1)
Output: t_age.id, t_age.age, ((t_age.age <-> 25))
Buffers: shared hit=95549
-> Index Scan using idx_t_age_1 on public.t_age (cost=0.36..90161.39 rows=10000000 width=12) (actual time=0.143..124.430 rows=100000 loops=1)
Output: t_age.id, t_age.age, (t_age.age <-> 25)
Order By: (t_age.age <-> 25)
Buffers: shared hit=95549
Planning Time: 0.108 ms
Execution Time: 148.951 ms
(14 rows)
```
2、绝不浪费,自定义pl函数优化
create or replace function ff(int, int, int) returns setof t_age as $$
declare
v t_age;
i int := 0;
begin
set enable_seqscan=off;
set enable_indexscan=on;
set enable_bitmapscan=off;
for v in
select * from t_age order by age <-> $1
loop
if (v.age <-> $1) >= $2 or (i) >= $3 then return; end if;
i := i+1;
return next v;
end loop;
return;
end;
$$ language plpgsql strict;
```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff(25,1,1000000);
QUERY PLAN
Function Scan on public.ff (cost=0.22..0.41 rows=1000 width=8) (actual time=159.203..164.049 rows=83553 loops=1)
Output: id, age
Function Call: ff(25, 1, 1000000)
Buffers: shared hit=79848
Planning Time: 0.042 ms
Execution Time: 168.601 ms
(6 rows)
```
3、内核优化
排序扫描支持filter识别,避免未满足limit数量时需要扫描所有记录
参考
《HTAP数据库 PostgreSQL 场景与性能测试之 6 - (OLTP) 空间应用 - KNN查询(搜索附近对象,由近到远排序输出)》
《GIS附近查找性能优化 - PostGIS long lat geometry distance search tuning using gist knn function》
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





