PostgreSQL - 时序、IoT类场景 - first_value , last_value , agg , cte , window , recursive

digoal 2021-01-04

286

作者

digoal

日期

2021-04-29

背景

知识点:
- cte
- window
- recursive
- agg
- 插件

TSDB的两个分析函数, 例如求分组的first_value和last_value

https://help.aliyun.com/document_detail/116041.html#h2-time_bucket-

示例：

first

根据device_id分组获得第一个温度值 SELECT device_id, first(temp, time) FROM metrics GROUP BY device_id; 警告⚠️: first 函数通过线性扫描计算结果，不使用索引。

last

根据device_id分组获得最后一个温度值 SELECT device_id, last(temp, time) FROM metrics GROUP BY device_id; 警告⚠️:last 函数通过线性扫描计算结果，不使用索引。

PostgreSQL 内置没有包含只取第一或最后一条的聚合函数.

https://www.postgresql.org/docs/current/functions-aggregate.html

《PostgreSQL aggregate function 1 : General-Purpose Aggregate Functions》
《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》
《PostgreSQL aggregate function 3 : Aggregate Functions for Ordered-Set》
《PostgreSQL aggregate function 4 : Hypothetical-Set Aggregate Functions》

但是有PostgreSQL插件可以支持, 也可以自定义, 如下:

https://wiki.postgresql.org/wiki/First/last_(aggregate)
http://pgxn.org/dist/first_last_agg/

还有更多统计和科学计算的函数可以在pgxn中找到

接下来展示2种内置方法:

创建测试表, 写入测试数据100万条

create table test (devid int, val float4, crt_time timestamp); insert into test select random()*20, random()*100, now()+((random()*2000::numeric)||' hour')::interval from generate_series(1,1000000); create index idx_test on test (devid, crt_time);

方法1、窗口实现

```
select devid,
min(val) filter (where rn1=1) as first_value,
min(val) filter (where rn2=1) as last_value
from (
select
*,
row_number() over w1 as rn1,
row_number() over w2 as rn2
from
test
window
w1 as (partition by devid order by crt_time),
w2 as (partition by devid order by crt_time desc)
) t
group by devid;

                                     QUERY PLAN

GroupAggregate (cost=272322.19..314822.40 rows=21 width=12)
Group Key: test.devid
-> WindowAgg (cost=272322.19..292322.19 rows=1000000 width=32)
-> Sort (cost=272322.19..274822.19 rows=1000000 width=24)
Sort Key: test.devid, test.crt_time
-> WindowAgg (cost=132154.34..152154.34 rows=1000000 width=24)
-> Sort (cost=132154.34..134654.34 rows=1000000 width=16)
Sort Key: test.devid, test.crt_time DESC
-> Seq Scan on test (cost=0.00..15406.00 rows=1000000 width=16)
JIT:
Functions: 13
Options: Inlining false, Optimization false, Expressions true, Deforming true
(12 rows)

devid | first_value | last_value
-------+-------------+-------------
0 | 69.20287 | 92.5132
1 | 40.00488 | 79.331635
2 | 81.52082 | 24.342728
3 | 56.46295 | 0.083137356
4 | 53.477367 | 36.13544
5 | 80.3527 | 72.615295
6 | 42.685253 | 24.84936
7 | 44.414867 | 92.358284
8 | 46.09131 | 93.73747
9 | 69.82049 | 70.78088
10 | 15.770509 | 7.430693
11 | 23.28746 | 17.332998
12 | 53.020107 | 7.4527717
13 | 95.81737 | 87.950645
14 | 16.533426 | 41.366306
15 | 20.856157 | 51.702602
16 | 35.354103 | 91.17102
17 | 14.365077 | 75.35907
18 | 81.229645 | 88.41152
19 | 3.7511733 | 75.90414
20 | 6.8691735 | 69.208405
(21 rows)
Time: 1456.356 ms (00:01.456)

postgres=# select * from test where devid=1 order by crt_time limit 1;
devid | val | crt_time
-------+----------+----------------------------
1 | 40.00488 | 2021-04-29 17:14:29.754692
(1 row)

postgres=# select * from test where devid=1 order by crt_time desc limit 1;
devid | val | crt_time
-------+-----------+----------------------------
1 | 79.331635 | 2021-07-22 01:12:35.705869
(1 row)
```

方法2、递归极速实现, 1.5秒优化到0.x毫秒, 千倍以上提升.

《时序数据合并场景加速分析和实现 - 复合索引，窗口分组查询加速，变态递归加速》

1、first_value

```
postgres=# with recursive skip as (
(
select test tv from test order by devid,crt_time limit 1
)
union all
(
select (
select test tv from test where devid > (s.tv).devid order by devid,crt_time limit 1
) from skip s where (s.tv).devid is not null limit 1
)
)
select * from skip t where t.* is not null;

                                            QUERY PLAN

CTE Scan on skip t (cost=6.18..6.40 rows=11 width=32)
Filter: (t. IS NOT NULL)
CTE skip
-> Recursive Union (cost=0.42..6.18 rows=11 width=40)
-> Subquery Scan on "SELECT* 1" (cost=0.42..0.49 rows=1 width=40)
-> Limit (cost=0.42..0.48 rows=1 width=52)
-> Index Scan using idx_test on test (cost=0.42..57043.47 rows=1000000 width=52)
-> Limit (cost=0.00..0.55 rows=1 width=32)
-> WorkTable Scan on skip s (cost=0.00..5.48 rows=10 width=32)
Filter: ((tv).devid IS NOT NULL)
(10 rows)

tv

(0,69.20287,"2021-04-29 17:29:03.513984")
(1,40.00488,"2021-04-29 17:14:29.754692")
(2,81.52082,"2021-04-29 17:13:38.138922")
(3,56.46295,"2021-04-29 17:13:41.576724")
(4,53.477367,"2021-04-29 17:13:38.799418")
(5,80.3527,"2021-04-29 17:20:07.61652")
(6,42.685253,"2021-04-29 17:12:43.131935")
(7,44.414867,"2021-04-29 17:14:24.41725")
(8,46.09131,"2021-04-29 17:13:21.088454")
(9,69.82049,"2021-04-29 17:16:56.557335")
(10,15.770509,"2021-04-29 17:12:43.72394")
(11,23.28746,"2021-04-29 17:14:51.66777")
(12,53.020107,"2021-04-29 17:13:26.826184")
(13,95.81737,"2021-04-29 17:17:23.924546")
(14,16.533426,"2021-04-29 17:16:12.762293")
(15,20.856157,"2021-04-29 17:15:59.386621")
(16,35.354103,"2021-04-29 17:13:39.879304")
(17,14.365077,"2021-04-29 17:12:40.277861")
(18,81.229645,"2021-04-29 17:15:47.412289")
(19,3.7511733,"2021-04-29 17:13:15.481478")
(20,6.8691735,"2021-04-29 17:16:08.605133")
(21 rows)

Time: 0.591 ms
```

2、last_value

```
postgres=# with recursive skip as (
(
select test tv from test order by devid,crt_time desc limit 1
)
union all
(
select (
select test tv from test where devid > (s.tv).devid order by devid,crt_time desc limit 1
) from skip s where (s.tv).devid is not null limit 1
)
)
select * from skip t where t.* is not null;

                                               QUERY PLAN

CTE Scan on skip t (cost=20578.75..20578.97 rows=11 width=32)
Filter: (t. IS NOT NULL)
CTE skip
-> Recursive Union (cost=3073.90..20578.75 rows=11 width=40)
-> Subquery Scan on "SELECT* 1" (cost=3073.90..3073.99 rows=1 width=40)
-> Limit (cost=3073.90..3073.98 rows=1 width=52)
-> Incremental Sort (cost=3073.90..78293.89 rows=1000000 width=52)
Sort Key: test.devid, test.crt_time DESC
Presorted Key: test.devid
-> Index Scan using idx_test on test (cost=0.42..57043.47 rows=1000000 width=52)
-> Limit (cost=0.00..1750.46 rows=1 width=32)
-> WorkTable Scan on skip s (cost=0.00..17504.55 rows=10 width=32)
Filter: ((tv).devid IS NOT NULL)
(13 rows)

tv

(0,92.5132,"2021-07-22 00:50:09.418747")
(1,79.331635,"2021-07-22 01:12:35.705869")
(2,24.342728,"2021-07-22 01:11:43.214143")
(3,0.083137356,"2021-07-22 01:10:48.842551")
(4,36.13544,"2021-07-22 01:03:19.314456")
(5,72.615295,"2021-07-22 01:09:47.155842")
(6,24.84936,"2021-07-22 01:11:14.875089")
(7,92.358284,"2021-07-22 01:07:30.652387")
(8,93.73747,"2021-07-22 01:09:03.046665")
(9,70.78088,"2021-07-22 01:11:31.421899")
(10,7.430693,"2021-07-22 01:10:48.575366")
(11,17.332998,"2021-07-22 01:10:28.107436")
(12,7.4527717,"2021-07-22 01:10:52.440714")
(13,87.950645,"2021-07-22 01:08:22.256052")
(14,41.366306,"2021-07-22 01:04:35.893593")
(15,51.702602,"2021-07-22 01:10:41.037534")
(16,91.17102,"2021-07-22 01:08:18.668903")
(17,75.35907,"2021-07-22 01:07:49.376161")
(18,88.41152,"2021-07-22 01:10:30.431861")
(19,75.90414,"2021-07-22 01:11:18.230903")
(20,69.208405,"2021-07-22 01:12:20.86822")
(21 rows)

Time: 574.792 ms

last_value 加个desc索引会更快

postgres=# create index idx_test1 on test (devid, crt_time desc);
CREATE INDEX
Time: 963.675 ms
postgres=# with recursive skip as (
(
select test tv from test order by devid,crt_time desc limit 1
)
union all
(
select (
select test tv from test where devid > (s.tv).devid order by devid,crt_time desc limit 1
) from skip s where (s.tv).devid is not null limit 1
)
)
select * from skip t where t.* is not null;
tv