PostgreSQL 多个范围条件的分段SQL 收敛过滤性优化 - 变换1个范围 TO x=any(array)

digoal 2021-01-05

647

作者

digoal

日期

2021-05-21

背景

当有多个范围条件时, 数据库只能用到1个范围条件, 其他范围条件无法基于联合索引进行精确收敛.

如下ts和col1其实都会转成范围条件

```
create unlogged table tbl (id int, col1 text, ts timestamp);
insert into tbl select random()1000000, random()::text, now()+(random()1000||' hour')::interval from generate_series(1,10000000);

select * from tbl
where ts >= $1 and ts <= $2
and col1 like '0.1%';

select * from tbl
where ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00'
and col1 like '0.1%';
```

正常情况下我们会建立如下索引, 让数据库选1个索引:

如果lc_collate=C, 可以使用:

```
postgres=# create index idx_tbl_2 on tbl(col1);
CREATE INDEX
Time: 5713.425 ms (00:05.713)

postgres=# create index idx_tbl_3 on tbl(ts);
CREATE INDEX
Time: 4155.860 ms (00:04.156)
```

就算你建立ts和col1的联合索引也没用, 前面已经解释过了, 第二个范围字段没法通过索引过滤.

查询性能如下:

```
postgres=# explain select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN

Aggregate (cost=114785.25..114785.26 rows=1 width=8)
-> Index Scan using idx_tbl_2 on tbl (cost=0.56..114353.77 rows=172590 width=0)
Index Cond: ((col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)

Time: 0.674 ms
```

```
postgres=# select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count

169344
(1 row)

Time: 4087.236 ms (00:04.087)
```

优化方法

时间是阶梯状的, 可以按天来进行分段, 建立表达式索引, 这样做的目的是把一个范围消掉, 作为前导查询, 这样就能使用index skip scan来加速, 第二个范围变成精确索引收敛.

```
create or replace function im_to_char (timestamp,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;

create or replace function im_to_char (timestamptz,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;

create or replace function im_to_char (date,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
```

```
create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1 text_pattern_ops);

如果lc_collate=C, 可以使用:

create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1);
```

SQL改写如下

select * from tbl where im_to_char(ts,'yyyymmdd') = any(array( select im_to_char(ts::date+i,'yyyymmdd') from generate_series(0,$2::date-$1::date+1) i )) and col1 like '0.1%' and ts >= $1 and ts <= $2;

得到这样的SQL:

select count(*) from tbl where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528']) and col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';

执行计划可以看到, 已经使用了这个联合索引, 而且index cond为我们希望的like和分段条件都用上了.
在filter中过滤精确时间和like.

```
postgres=# explain select count(*) from tbl where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528']) and col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN

Aggregate (cost=36081.60..36081.61 rows=1 width=8)
-> Index Scan using idx_t_1 on tbl (cost=0.56..36064.35 rows=6900 width=0)
Index Cond: ((im_to_char(ts, 'yyyymmdd'::text) = ANY ('{20210521,20210522,20210523,20210524,20210525,20210526,20210527,20210528}'::text[])) AND (col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)

Time: 0.883 ms
```

耗时明显降低.

```
postgres=# select count(*) from tbl
postgres-# where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])
postgres-# and col1 like '0.1%'
postgres-# and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count

169344
(1 row)

Time: 680.602 ms
```

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL 多个范围条件的分段SQL 收敛过滤性优化 - 变换1个范围 TO x=any(array)

作者

日期

标签

背景

优化方法

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论