机票业务性能优化案例 - 阿里云RDS PostgreSQL最佳实践

digoal 2017-07-27

325

作者

digoal

日期

2017-07-27

背景

机票业务的某个模块，数据量10亿+，写、更新、删除量较低。根据KEY查询一些数据，每次查询返回1万条左右的记录。

就是这样简单的需求，业务方发现读成为了巨大的瓶颈，每次返回1万条，100个并发请求，每秒就是100万条（500MB左右），主要的瓶颈：

1、网络是个较大的开销。

2、不同KEY的数据可能是分散存放的，存在查询时的IO放大，可能有一定的性能影响。

3、每次请求的返回记录数较多，数据库search buffer调用可能开销会上升。

就这几个问题，我们来看看如何优化或解决业务方的问题。

建模

1、建表

create table test( id int, info text, -- 一些属性，我这里用一个字段代表它 typeid int, -- 类别，也是用户的查询过滤条件，约10万个类别，每个类别1万条记录，总共10亿记录。 crt_time timestamp, -- 创建时间 mod_time timestamp -- 修改时间 );

2、灌入测试数据

insert into test select generate_series(1,1000000000), 'test', random()*99999, now();

3、创建索引

create index idx_test_typeid on test (typeid);

4、原始SQL请求

```
select * from test where typeid=?;

约返回1万记录。
```

了解数据分布

通过pg_stats可以看到typeid和物理存储的线性相关性才0.012，非常分散。

按TYPEID访问时，IO放大很严重，也就是说1万条记录可能分散在1万个数据块中。

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where typeid =1;
QUERY PLAN

Index Scan using idx_test_typeid on postgres.test (cost=0.57..13343.21 rows=10109 width=29) (actual time=0.029..14.283 rows=9935 loops=1)
Output: id, info, typeid, crt_time, mod_time
Index Cond: (test.typeid = 1)
Buffers: shared hit=9959 -- typeid=1的记录分散在9959个数据块中
Planning time: 0.085 ms
Execution time: 14.798 ms
(6 rows)
```

原始SQL性能评估、瓶颈分析

1、压测

```
vi test.sql

\set typeid random(0,99999)
select * from test where typeid=:typeid;
```

压测结果，TPS 1653。

```
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 198445
latency average = 38.699 ms
latency stddev = 7.898 ms
tps = 1653.239177 (including connections establishing)
tps = 1653.525600 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set typeid random(0,99999)
38.697 select * from test where typeid=:typeid;
```

2、perf 观察瓶颈

```
perf top -ag

Children Self Shared Object Symbol
+ 15.31% 15.19% postgres [.] hash_search_with_hash_value
+ 14.48% 8.78% postgres [.] heap_hot_search_buffer
+ 9.95% 2.26% [kernel] [k] page_fault
+ 9.44% 8.24% postgres [.] heap_page_prune_opt
+ 7.67% 0.02% [kernel] [k] do_page_fault
+ 7.62% 0.21% [kernel] [k] __do_page_fault
+ 6.89% 0.41% [kernel] [k] handle_mm_fault
+ 6.87% 6.80% postgres [.] PinBuffer
+ 4.32% 0.18% [kernel] [k] __do_fault
+ 4.03% 4.00% postgres [.] LWLockAcquire
+ 3.83% 0.00% [kernel] [k] system_call_fastpath
+ 3.17% 3.15% libc-2.17.so [.] __memcpy_ssse3_back
+ 3.01% 0.16% [kernel] [k] shmem_fault
+ 2.85% 0.13% [kernel] [k] shmem_getpage_gfp
```

优化手段1，cluster化

1、PostgreSQL提供了一个cluster的功能，可以将表按索引进行CLUSTER，即重排。

效果是这个索引对应列（或多列）与物理顺序的线性相关性变成1或-1，也就是线性完全一致，那么在按这个字段或这些字段进行条件过滤时，扫描的堆表数据块大幅度降低。

```
postgres=# cluster test using idx_test_typeid;

2、测试cluster后，按typeid过滤数据，只需要扫描96个数据块了。SQL的响应时间也从14.8毫秒降到了1.9毫秒。

```
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where typeid =1;
QUERY PLAN

Index Scan using idx_test_typeid on postgres.test (cost=0.57..13343.21 rows=10109 width=29) (actual time=0.011..1.413 rows=9935 loops=1)
Output: id, info, typeid, crt_time, mod_time
Index Cond: (test.typeid = 1)
Buffers: shared hit=96
Planning time: 0.039 ms
Execution time: 1.887 ms
(6 rows)
```

3、压测，TPS 2715。相比原始性能提升了 64%。

```
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 326188
latency average = 23.546 ms
latency stddev = 7.793 ms
tps = 2715.409760 (including connections establishing)
tps = 2715.677062 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set typeid random(0,99999)
23.543 select * from test where typeid=:typeid;
```

4、perf 观察瓶颈

```
用户态的调用不在TOP里面。

14.30% 0.00% [kernel] [k] system_call_fastpath
9.62% 1.26% [kernel] [k] page_fault
8.35% 0.01% [kernel] [k] do_page_fault
8.27% 0.14% [kernel] [k] __do_page_fault
6.81% 0.37% libc-2.17.so [.] sysmalloc
6.48% 0.10% [kernel] [k] __alloc_pages_nodemask
5.84% 0.40% [kernel] [k] handle_mm_fault
5.84% 0.05% libpthread-2.17.so [.] __libc_send
5.83% 5.79% libc-2.17.so [.] __memcpy_ssse3_back
5.74% 0.03% libpthread-2.17.so [.] __libc_recv
```

优化1小结

1、优化手段1没有涉及到降低网络开销的优化。

2、使用cluster后，完全规避了IO放大的问题。

3、但是每次请求返回的记录数与原来一样，对数据库search buffer没有起到效果。

4、聚集操作是静态操作，数据库并不会一直维持这个状态。

不过PG可以设置fillfactor，使得更新后的版本尽量在当前数据块。这种方法对于更新很有效，只要对应的搜索KEY不变更，那么线性相关性可以一直被维持。对于新增数据无效。所以cluster特别适合相对静态的数据，或者时间维度上，旧的数据基本不变更的场景，可以使用时间分区表，对旧数据实施CLUSTER，保证就数据的线性相关性。

alter table test set (fillfactor=80);

优化手段2，聚集化

优化2的目标和1类似，但是将数据聚集为单条，同时提升数据的压缩比，不过是数据库端压缩，所以对网络需求的降低并没有效果。

1、聚集，因为更新少，所以我们可以将多条记录聚集为一条记录。

```
create table test_agg (typeid int, content jsonb);

insert into test_agg select typeid, jsonb_agg(jsonb_build_object('id',id,'info',info,'crt_time',crt_time,'mod_time',mod_time)) from test group by typeid;

create index idx_test_agg_1 on test_agg(typeid);
```

2、查询请求

select * from test_agg where typeid=?

3、增、删、改

JSON类型的操作函数如下：

https://www.postgresql.org/docs/10/static/functions-json.html

4、优化后的性能指标

压测，性能并没有提升

```
vi test1.sql

\set typeid random(0,99999)
select * from test_agg where typeid=:typeid;

pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 120

transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
duration: 120 s
number of transactions actually processed: 151156
latency average = 50.803 ms
latency stddev = 2.913 ms
tps = 1258.934362 (including connections establishing)
tps = 1259.301582 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set typeid random(0,99999)
50.801 select * from test_agg where typeid=:typeid;
```

优化2小结

性能并没有提升，转换为JSONB类型后，每个ELEMETE都增加了头部信息，所以网络传输的空间实际变大了。

...... {"id": 172264479, "info": "test", "crt_time": "2017-07-27T20:41:32.365209", "mod_time": null}, {"id": 172304687, "info": "test", "crt_time": "2017-07-27T20:41:32.365209", "mod_time": null}, ......

这个优化方法并没有赚到。