作者简介
张晋 来自沈阳美行科技
背景
模拟
数据库表中约有500w条车牌号记录,对表中的车牌号进行模糊查询。即支持 car_id like ‘%XXXX%XXX%’ 查询
---创建表create table t_car (id int , car_id text);--插入500万车牌数据insert into t_car select generate_series(1,5000000), (array['辽A','辽B','吉A','吉B','黑A','黑B'])[floor(random()*6+1)] || substring(md5(random()::text),0,6);--查看数据select * from t_car limit 5;id | car_id----+----------1 | 吉A43bb92 | 吉B19b643 | 辽Afb04e4 | 吉Bcf90c5 | 辽Be67df(5 行记录)
索引
· 顺序扫描
explain analyze verbose select * from t_car where car_id = '辽Be67df';QUERY PLAN-----------------------------------------------------------------Gather (cost=1000.00..54069.87 rows=2 width=14) (actual time=0.458..268.782 rows=4 loops=1)Output: id, car_idWorkers Planned: 2Workers Launched: 2-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=1 width=14) (actual time=140.151..253.061 rows=1 loops=3)Output: id, car_idFilter: (t_car.car_id = '辽Be67df'::text)Rows Removed by Filter: 1666665Worker 0: actual time=246.618..246.619 rows=0 loops=1Worker 1: actual time=173.812..251.916 rows=1 loops=1Planning time: 0.174 msExecution time: 268.820 ms(12 行记录)时间:269.684 ms
· btree
创建btree类型索引create index btree_index_01 on t_car using btree (car_id);等值查询,结果还是相当给力postgres=# explain analyze verbose select * from t_car where car_id = '辽Be67df';QUERY PLAN------------------------------------------------------------------------------------------------------------------------------Index Scan using btree_index_01 on public.t_car (cost=0.43..3.77 rows=2 width=14) (actual time=0.047..0.057 rows=4 loops=1)Output: id, car_idIndex Cond: (t_car.car_id = '辽Be67df'::text)Planning time: 0.218 msExecution time: 0.091 ms(5 行记录)时间:0.971 ms后模糊查询,不尽人意。不符合预期。postgres=# explain analyze verbose select * from t_car where car_id like '辽Be67df%';QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------Gather (cost=1000.00..54119.47 rows=498 width=14) (actual time=0.426..281.444 rows=4 loops=1)Output: id, car_idWorkers Planned: 2Workers Launched: 2-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=208 width=14) (actual time=138.857..270.492 rows=1 loops=3)Output: id, car_idFilter: (t_car.car_id ~~ '辽Be67df%'::text)Rows Removed by Filter: 1666665Worker 0: actual time=148.488..267.226 rows=2 loops=1Worker 1: actual time=268.058..268.058 rows=0 loops=1Planning time: 0.153 msExecution time: 281.481 ms(12 行记录)时间:282.275 ms
以上btree索引没有起到作用的原因,是因为在创建索引时,默认的opclass为等值查询。
详情:https://www.postgresql.org/docs/10/indexes-opclass.html
重新创建btree索引drop index btree_index_01;create index btree_index_01 on t_car using btree (car_id text_pattern_ops);这次查询结果符合预期explain analyze verbose select * from t_car where car_id like '辽Be67df%';QUERY PLAN-----------------------------------------------------------------Index Scan using btree_index_01 on public.t_car (cost=0.43..2.66 rows=498 width=14) (actual time=0.050..0.073 rows=4 loops=1)Output: id, car_idIndex Cond: ((t_car.car_id ~>=~ '辽Be67df'::text) AND (t_car.car_id ~<~ '辽Be67dg'::text))Filter: (t_car.car_id ~~ '辽Be67df%'::text)Planning time: 0.473 msExecution time: 0.106 ms(6 行记录)时间:1.407 ms前模糊查询还是不行explain analyze verbose select * from t_car where car_id like '%辽Be67df';QUERY PLAN----------------------------------------------------------------- Gather (cost=1000.00..54119.47 rows=498 width=14) (actual time=0.476..309.504 rows=4 loops=1)Output: id, car_idWorkers Planned: 2Workers Launched: 2-> Parallel Seq Scan on public.t_car (cost=0.00..53069.67 rows=208 width=14) (actual time=121.528..297.927 rows=1 loops=3)Output: id, car_idFilter: (t_car.car_id ~~ '%辽Be67df'::text)Rows Removed by Filter: 1666665Worker 0: actual time=200.075..294.789 rows=1 loops=1Worker 1: actual time=164.486..295.015 rows=1 loops=1Planning time: 0.139 msExecution time: 309.536 ms(12 行记录)时间:310.278 ms解决前模糊查询的方法,反转查询字段。在逻辑上变成后模糊查询。create index btree_index_02 on t_car using btree (reverse(car_id) text_pattern_ops);
· gin
真正的支持模糊查询
创建扩展create extension pg_trgm;创建gin索引create index gin_index_01 on t_car using gin(car_id gin_trgm_ops);模糊查询速度也能飞explain analyze verbose select * from t_car where car_id like '%辽Be67df%';QUERY PLAN-----------------------------------------------------------------Bitmap Heap Scan on public.t_car (cost=36.86..579.80 rows=498 width=14) (actual time=3.936..4.018 rows=4 loops=1)Output: id, car_idRecheck Cond: (t_car.car_id ~~ '%辽Be67df%'::text)Heap Blocks: exact=4-> Bitmap Index Scan on gin_index_01 (cost=0.00..36.73 rows=498 width=0) (actual time=3.900..3.900 rows=4 loops=1)Index Cond: (t_car.car_id ~~ '%辽Be67df%'::text)Planning time: 2.918 msExecution time: 4.359 msexplain analyze verbose select * from t_car where car_id like '%辽Be6%df%';QUERY PLAN-----------------------------------------------------------------Bitmap Heap Scan on public.t_car (cost=17.06..560.00 rows=498 width=14) (actual time=5.958..15.353 rows=35 loops=1)Output: id, car_idRecheck Cond: (t_car.car_id ~~ '%辽Be6%df%'::text)Rows Removed by Index Recheck: 3273Heap Blocks: exact=3098-> Bitmap Index Scan on gin_index_01 (cost=0.00..16.93 rows=498 width=0) (actual time=5.172..5.172 rows=3308 loops=1)Index Cond: (t_car.car_id ~~ '%辽Be6%df%'::text)Planning time: 0.224 msExecution time: 15.462 ms(9 行记录)
扩展阅读
rum(https://github.com/postgrespro/rum?spm=a2c4e.11153940.blogcont111793.51.50575bf0HjdIsl) 是一个索引插件,由Postgrespro开源,适合全文检索,属于GIN的增强版本。
增强包括:
1、在RUM索引中,存储了lexem的位置信息,所以在计算ranking时,不需要回表查询(而GIN需要回表查询)。
2、RUM支持phrase搜索,而GIN无法支持。
3、在一个RUM索引中,允许用户在posting tree中存储除ctid(行号)以外的字段VALUE,例如时间戳。
如果这种需求多了还是考虑elasticsearch吧
zombodb是PostgreSQL与ElasticSearch结合的一个索引接口,可以直接读写ES。
请点击文章底部“阅读原文”查看原文内容。
PostgreSQL中文社区欢迎广大技术人员投稿
投稿邮箱:press@postgres.cn
最后修改时间:2021-03-23 09:33:04
文章转载自PostgreSQL中文社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




