作者
digoal
日期
2021-05-21
标签
PostgreSQL , = any , 范围 , 分段
背景
当有多个范围条件时, 数据库只能用到1个范围条件, 其他范围条件无法基于联合索引进行精确收敛.
如下ts和col1其实都会转成范围条件
```
create unlogged table tbl (id int, col1 text, ts timestamp);
insert into tbl select random()1000000, random()::text, now()+(random()1000||' hour')::interval from generate_series(1,10000000);
select * from tbl
where ts >= $1 and ts <= $2
and col1 like '0.1%';
select * from tbl
where ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00'
and col1 like '0.1%';
```
正常情况下我们会建立如下索引, 让数据库选1个索引:
如果lc_collate=C, 可以使用:
```
postgres=# create index idx_tbl_2 on tbl(col1);
CREATE INDEX
Time: 5713.425 ms (00:05.713)
postgres=# create index idx_tbl_3 on tbl(ts);
CREATE INDEX
Time: 4155.860 ms (00:04.156)
```
就算你建立ts和col1的联合索引也没用, 前面已经解释过了, 第二个范围字段没法通过索引过滤.
查询性能如下:
```
postgres=# explain select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN
Aggregate (cost=114785.25..114785.26 rows=1 width=8)
-> Index Scan using idx_tbl_2 on tbl (cost=0.56..114353.77 rows=172590 width=0)
Index Cond: ((col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)
Time: 0.674 ms
```
```
postgres=# select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count
169344
(1 row)
Time: 4087.236 ms (00:04.087)
```
优化方法
时间是阶梯状的, 可以按天来进行分段, 建立表达式索引, 这样做的目的是把一个范围消掉, 作为前导查询, 这样就能使用index skip scan来加速, 第二个范围变成精确索引收敛.
```
create or replace function im_to_char (timestamp,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
create or replace function im_to_char (timestamptz,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
create or replace function im_to_char (date,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
```
```
create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1 text_pattern_ops);
如果lc_collate=C, 可以使用:
create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1);
```
SQL改写如下
select * from tbl
where im_to_char(ts,'yyyymmdd') = any(array(
select im_to_char(ts::date+i,'yyyymmdd') from generate_series(0,$2::date-$1::date+1) i
))
and col1 like '0.1%'
and ts >= $1 and ts <= $2;
得到这样的SQL:
select count(*) from tbl
where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])
and col1 like '0.1%'
and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
执行计划可以看到, 已经使用了这个联合索引, 而且index cond为我们希望的like和分段条件都用上了.
在filter中过滤精确时间和like.
```
postgres=# explain select count(*) from tbl where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528']) and col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN
Aggregate (cost=36081.60..36081.61 rows=1 width=8)
-> Index Scan using idx_t_1 on tbl (cost=0.56..36064.35 rows=6900 width=0)
Index Cond: ((im_to_char(ts, 'yyyymmdd'::text) = ANY ('{20210521,20210522,20210523,20210524,20210525,20210526,20210527,20210528}'::text[])) AND (col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)
Time: 0.883 ms
```
耗时明显降低.
```
postgres=# select count(*) from tbl
postgres-# where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])
postgres-# and col1 like '0.1%'
postgres-# and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count
169344
(1 row)
Time: 680.602 ms
```
PostgreSQL 许愿链接
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.
9.9元购买3个月阿里云RDS PostgreSQL实例
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





