PostgreSQL IoT，车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)

digoal 2018-12-09

169

作者

digoal

日期

2018-12-09

背景

车联网场景，一个行程由若干个点组成，业务需要按行程查询，按。

但是问题来了：

一个行程包含多个点，那么一次查询意味着需要查询并返回多个点（多条记录）。

由于有多个设备（例如汽车），大概率可能同时活跃，同时上传点，因此以堆表存储的话，一个行程中的多条记录，实际上在数据库的表里面，存储在多个BLOCK里面。

如上分析，性能问题：IO放大。

如何优化？

1、行程记录，按行聚集存储。

类似cluster操作。

2、行程记录，合并到单条，聚集存储。

类似将一个行程多条记录聚合。

例子

下面分别测试几种优化方法带来的性能优化效果。

1、cluster

2、array 聚合带压缩

3、array 聚合不带压缩

4、jsonb 聚合带压缩

5、jsonb 聚合不带压缩

6、text 聚合带压缩

7、text 聚合不带压缩

8、类聚簇表

《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》

1 原始状态

create unlogged table t_sensor(id serial8 primary key, sid int8, att text, crt_time timestamp); create index idx_t_sensor_1 on t_sensor (sid, crt_time);

```
vi test.sql

\set sid random(1,10000)
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000
```

查看一个行程，需要访问11227个数据块。

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;
QUERY PLAN

Index Scan using idx_t_sensor_1 on public.t_sensor (cost=0.49..253.36 rows=11121 width=57) (actual time=0.020..47.591 rows=11189 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 1)
Buffers: shared hit=3406 read=7821
Planning Time: 0.092 ms
Execution Time: 48.303 ms
(6 rows)

postgres=# select 119112278/1024.0;
?column?

10437.601562500000
(1 row)
```

压测性能

```
vi test.sql

\set sid random(1,10000)
select * from t_sensor where sid=:sid order by crt_time ;

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120
```

transaction type: ./test.sql scaling factor: 1 query mode: prepared number of clients: 56 number of threads: 56 duration: 120 s number of transactions actually processed: 14307 latency average = 470.175 ms latency stddev = 43.500 ms tps = 119.037842 (including connections establishing) tps = 119.098221 (excluding connections establishing) statement latencies in milliseconds: 0.003 \set sid random(1,10000) 470.178 select * from t_sensor where sid=:sid order by crt_time ;

2 cluster

按行程，时间索引聚集。一次查询返回一条聚合后的记录。

postgres=# cluster t_sensor USING idx_t_sensor_1 ;

查询一个行程，扫描174个数据块。IO 骤降。

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;
QUERY PLAN

Index Scan using idx_t_sensor_1 on public.t_sensor (cost=0.49..253.39 rows=11149 width=57) (actual time=0.018..2.212 rows=11189 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 1)
Buffers: shared hit=174
Planning Time: 0.094 ms
Execution Time: 2.816 ms
(6 rows)
```

压测性能，见末尾。

3 array 带压缩

create unlogged table t_sensor_agg(sid int8 primary key, agg t_sensor[]); insert into t_sensor_agg select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;

瓶颈：array类型的INPUT OUTPUT接口，以及压缩解压缩接口

Samples: Overhead Shared Object 24.59% postgres 20.70% postgres 6.74% postgres 3.78% libc-2.17.so 2.72% postgres 2.34% [kernel] 2.23% postgres 2.13% postgres 1.76% postgres 1.49% libc-2.17.so 1.36% postgres 1.36% [kernel] 1.29% postgres 1.28% postgres 1.17% postgres 0.92% postgres 0.86% [kernel] 0.84% postgres 0.82% postgres 0.76% postgres 0.76% postgres 0.67% postgres 0.66% postgres 0.64% [kernel] 0.64% [kernel] 3M of event 'cpu-clock', Event count (approx.): 559749813667 Symbol [.] array_out [.] record_out [.] pglz_decompress [.] __memcpy_ssse3_back [.] pg_ltostr_zeropad [k] run_timer_softirq [.] pg_lltoa [.] 0x000000000035c350 [.] heap_deform_tuple [.] __strlen_sse2_pminub [.] appendStringInfoChar [k] copy_user_enhanced_fast_string [.] 0x000000000035c36c [.] 0x000000000035c362 [.] FunctionCall1Coll [.] hash_search_with_hash_value [k] _raw_spin_unlock_irqrestore [.] j2date [.] 0x000000000035c357 [.] palloc [.] lookup_type_cache [.] 0x000000000035c360 [.] timestamp2tm [k] rcu_process_callbacks [k] __do_softirq

```
vi test.sql

\set sid random(1,10000)
select * from t_sensor_agg4 where sid=:sid ;
```

4 array 不带压缩

create unlogged table t_sensor_agg1(sid int8 primary key, agg t_sensor[]); alter table t_sensor_agg1 alter column agg set storage external; insert into t_sensor_agg1 select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;

瓶颈：array类型的INPUT OUTPUT接口

Samples: 1M of event 'cpu-clock', Event count (approx.): 310127790569 Overhead Shared Object Symbol 26.06% postgres [.] array_out 21.44% postgres [.] record_out 4.20% libc-2.17.so [.] __memcpy_ssse3_back 2.86% [kernel] [k] run_timer_softirq 2.75% postgres [.] pg_ltostr_zeropad 2.65% postgres [.] heap_deform_tuple 2.28% postgres [.] pg_lltoa 2.14% postgres [.] 0x000000000035c350 1.87% [kernel] [k] copy_user_enhanced_fast_string 1.52% libc-2.17.so [.] __strlen_sse2_pminub 1.47% postgres [.] appendStringInfoChar 1.32% postgres [.] 0x000000000035c36c 1.30% postgres [.] 0x000000000035c362 1.20% postgres [.] FunctionCall1Coll 1.11% postgres [.] hash_search_with_hash_value 0.87% postgres [.] j2date 0.85% postgres [.] 0x000000000035c357 0.81% [kernel] [k] _raw_spin_unlock_irqrestore 0.76% postgres [.] lookup_type_cache 0.75% postgres [.] 0x000000000046d33b 0.74% postgres [.] palloc 0.72% [kernel] [k] rcu_process_callbacks 0.68% postgres [.] timestamp2tm 0.68% postgres [.] pfree

5 jsonb 带压缩

create unlogged table t_sensor_agg2(sid int8 primary key, agg jsonb); insert into t_sensor_agg2 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;

6 jsonb 不带压缩

create unlogged table t_sensor_agg3(sid int8 primary key, agg jsonb); alter table t_sensor_agg3 alter column agg set storage external; insert into t_sensor_agg3 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;

7 text 带压缩

create unlogged table t_sensor_agg4(sid int8 primary key, agg text); insert into t_sensor_agg4 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;

8 text 不带压缩

create unlogged table t_sensor_agg5(sid int8 primary key, agg text); alter table t_sensor_agg5 alter column agg set storage external; insert into t_sensor_agg5 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;

9 index only scan 类似聚集表效果

注意：除了本文提到的full index达到类似聚簇表效果，还可以使用index include，达到同样效果，而且索引更加有效。

《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》

所有内容作为INDEX的KEY，类似聚集表的效果(相邻内容在同一个INDEX PAGE里面)。查询时走INDEX ONLY SCAN扫描方法，扫描的BLOCK最少。

注意：btree 索引内容不能超过1/3 PAGE (因为BTREE是双向链表，一个PAGE至少要有一条有效记录，所以有这个限制。)。

写入数据

```
create table t_sensor (id serial8 primary key, sid int8, att text, crt_time timestamp);

create index idx_t_sensor_1 on t_sensor (sid, crt_time, att, id);

vi test.sql
\set sid random(1,10000)
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 50
number of threads: 50
number of transactions per client: 2000000
number of transactions actually processed: 100000000/100000000
latency average = 0.193 ms
latency stddev = 0.461 ms
tps = 257995.418591 (including connections establishing)
tps = 258024.212148 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set sid random(1,10000)
0.192 insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());
```

生成VM文件（autovacuum触发时会自动生成，但是为了立马看效果，手工执行一下。）

vacuum analyze t_sensor;

INDEX ONLY SCAN, IO减少效果如下：

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=2 order by crt_time;
QUERY PLAN

Index Only Scan using idx_t_sensor_1 on public.t_sensor (cost=0.60..70.41 rows=9960 width=57) (actual time=0.019..2.109 rows=9978 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 2)
Heap Fetches: 0
Buffers: shared hit=235
Planning Time: 0.090 ms
Execution Time: 2.652 ms
(7 rows)
```

查询性能：

```
vi test.sql

\set sid random(1,10000)
select * from t_sensor where sid=:sid order by crt_time;

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 28 -j 28 -T 120

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 283638
latency average = 11.844 ms
latency stddev = 1.931 ms
tps = 2363.410561 (including connections establishing)
tps = 2363.913145 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set sid random(1,10000)
11.842 select * from t_sensor where sid=:sid order by crt_time;
```

小结

目标数据分散在多个BLOCK中，引起IO放大的问题，通过聚集存储，或者通过聚合存储，可以很好的解决这个问题。

聚合后，另一个瓶颈则是聚合后的类型（array,jsonb,text）的IN OUT接口。

/ | 表存储 | 行程查询 qps
---|---|---
原始(IO 放大) | 8880 MB | 119
顺序(无IO 放大) | 8880 MB | 2057
index only scan(类似聚集表)(无IO 放大) | 8880 MB | 2363
聚合array(压缩) | 4523 MB | 2362
聚合array(不压缩) | 8714 MB | 2515
聚合json(压缩) | 5052 MB | 3102
聚合json(不压缩) | 13 GB | 3184
聚合text(压缩) | 4969 MB | 6057
聚合text(不压缩) | 7692 MB | 5997

从上面的测试，可以看到IN OUT函数接口的开销，text<jsonb<array(composite array)。

实际的优化例子，可参考末尾的几篇文章。例如：

1、按时间分区，旧的分区使用cluster，按行程整理数据，使用AB表切换，解决IO放大的问题。

2、异步聚合，将点数据准实时按行程ID，聚合到聚合后的行程表。

3、使用INDEX ONLY SCAN, 达到聚集表效果。对业务无任何侵入性。(例如按天分区，加全量(业务需要查询的字段)索引。)，相当于两倍存储空间(一份在堆表，一份在索引中)。

参考

《PostgreSQL IoT，车联网 - 实时轨迹、行程实践》

《PostgreSQL AB表切换最佳实践 - 提高切换成功率，杜绝雪崩 - 珍藏级》

《PostgreSQL 时序最佳实践 - 证券交易系统数据库设计 - 阿里云RDS PostgreSQL最佳实践》

《PostgreSQL index include - 类聚簇表与应用(append only, IoT时空轨迹, 离散多行扫描与返回)》

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL IoT，车联网 - 实时轨迹、行程实践 2 - (含index only scan类聚簇表效果)

作者

日期

标签

背景

例子

1 原始状态

2 cluster

3 array 带压缩

4 array 不带压缩

5 jsonb 带压缩

6 jsonb 不带压缩

7 text 带压缩

8 text 不带压缩

9 index only scan 类似聚集表效果

小结

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论