PostgreSQL模糊查询提速

叶同学专栏 2022-09-02

561

模糊查询场景

常见的模糊查询场景有：

前模糊(like 'str%')
后模糊((like '%str'))
前后模糊((like '%str%'))

另外还有相似查询场景：

例如搜索p0stgre也能够搜索到postgre

前模糊的查询

使用b-tree可以支持前模糊的查询 ,但仅适合于collate="C"的查询，当数据库默认的lc_collate<>C时，索引和查询都需要明确指定collate "C"。

操作系统的的locale

[postgres@yejf ~]$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

创建默认数据库时，不指定lc_collate的话，默认使用操作系统的locale

temp=# create database test1;
CREATE DATABASE
temp=# \l test1
                              List of databases
 Name  |  Owner   | Encoding |   Collate   |    Ctype    | Access privileges 
-------+----------+----------+-------------+-------------+-------------------
 test1 | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 | 
(1 row)

使用如下SQL可以查询系统表pg_collation得到字符集支持的lc_collate和lc_ctype

select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;

创建lc_collate='C'的数据库

temp=# create database test2 with encoding 'UTF-8' template template0 lc_collate='C' lc_ctype='C'; 
CREATE DATABASE
temp=# \l test2
                         List of databases
 Name  |  Owner   | Encoding | Collate | Ctype | Access privileges 
-------+----------+----------+---------+-------+-------------------
 test2 | postgres | UTF8     | C       | C     | 
(1 row)

在test1库上(collate='zh_CN.UTF-8')测试b-tree索引对前模糊查询的支持

索引和查询没有指定collate "C"，没有使用到索引

test1=# create table test(id int, info text);
CREATE TABLE
test1=# insert into test select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
test1=# 
test1=# 
test1=# explain analyze select * from test where info like 'abcd%';
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..15375.79 rows=5292 width=36) (actual time=7.374..197.378 rows=16 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..13846.59 rows=2205 width=36) (actual time=60.877..184.245 rows=5 loops=3)
         Filter: (info ~~ 'abcd%'::text)
         Rows Removed by Filter: 333328
 Planning Time: 0.645 ms
 Execution Time: 197.455 ms
(8 rows)

test1=# create index idx on test(info);
CREATE INDEX
test1=# explain analyze select * from test where info like 'abcd%';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..15042.33 rows=5000 width=36) (actual time=5.247..107.030 rows=16 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..13542.33 rows=2083 width=36) (actual time=11.692..98.219 rows=5 loops=3)
         Filter: (info ~~ 'abcd%'::text)
         Rows Removed by Filter: 333328
 Planning Time: 0.679 ms
 Execution Time: 107.105 ms
(8 rows)

索引和查询都需要明确指定collate "C"时，能够使用到索引

test1=# drop index idx;
DROP INDEX
test1=# create index idx on test(info collate "C");
CREATE INDEX

test1=# explain (analyze,verbose,timing,costs,buffers) select * from test where info like 'abcd%' collate "C"; 
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx on public.test  (cost=0.42..8.45 rows=100 width=37) (actual time=0.037..0.087 rows=16 loops=1)
   Output: id, info
   Index Cond: ((test.info >= 'abcd'::text) AND (test.info < 'abce'::text))
   Filter: (test.info ~~ 'abcd%'::text COLLATE "C")
   Buffers: shared hit=19
 Planning Time: 0.578 ms
 Execution Time: 0.146 ms
(7 rows)

在test2库上(collate='C')，索引不用指定就能支持前模糊查询

test1=# \c test2;
You are now connected to database "test2" as user "postgres".
test2=# create table test(id int, info text);
CREATE TABLE
test2=# insert into test select generate_series(1,1000000),md5(random()::text);
INSERT 0 1000000
test2=# create index idx on test(info);
CREATE INDEX
test2=# explain (analyze,verbose,timing,costs,buffers) select * from test where info like 'abcd%';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=199.68..7809.06 rows=5000 width=36) (actual time=0.095..0.147 rows=18 loops=1)
   Output: id, info
   Filter: (test.info ~~ 'abcd%'::text)
   Heap Blocks: exact=18
   Buffers: shared hit=18 read=3
   ->  Bitmap Index Scan on idx  (cost=0.00..198.43 rows=5000 width=0) (actual time=0.063..0.064 rows=18 loops=1)
         Index Cond: ((test.info >= 'abcd'::text) AND (test.info < 'abce'::text))
         Buffers: shared read=3
 Planning Time: 1.011 ms
 Execution Time: 0.204 ms
(10 rows)

后模糊查询

使用反转函数(reverse)索引(称表达式索引)，可以支持后模糊的查询，同样也仅适合于collate="C"的查询。

create index idx1 on test(reverse(info));

测试如下，性能提升明显

test2=# select * from test limit 1;
 id |               info               
----+----------------------------------
  1 | 70415ef007c76c7a95dedd972fbdbaf1
(1 row)

test2=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like 'baf1%';
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..16084.00 rows=5000 width=37) (actual time=108.614..415.603 rows=14 loops=1)
   Output: id, info
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=8334
   ->  Parallel Seq Scan on public.test  (cost=0.00..14584.00 rows=2083 width=37) (actual time=73.336..404.445 rows=5 loops=3)
         Output: id, info
         Filter: (reverse(test.info) ~~ 'baf1%'::text)
         Rows Removed by Filter: 333329
         Buffers: shared hit=8334
         Worker 0: actual time=27.651..402.151 rows=4 loops=1
           Buffers: shared hit=2683
         Worker 1: actual time=84.476..400.209 rows=6 loops=1
           Buffers: shared hit=2550
 Planning Time: 0.146 ms
 Execution Time: 415.841 ms
(16 rows)

test2=# create index idx1 on test(reverse(info));
CREATE INDEX
test2=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like 'baf1%';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=199.68..7821.56 rows=5000 width=37) (actual time=0.090..0.143 rows=14 loops=1)
   Output: id, info
   Filter: (reverse(test.info) ~~ 'baf1%'::text)
   Heap Blocks: exact=14
   Buffers: shared hit=14 read=3
   ->  Bitmap Index Scan on idx1  (cost=0.00..198.43 rows=5000 width=0) (actual time=0.067..0.067 rows=14 loops=1)
         Index Cond: ((reverse(test.info) >= 'baf1'::text) AND (reverse(test.info) < 'baf2'::text))
         Buffers: shared read=3
 Planning Time: 0.581 ms
 Execution Time: 0.182 ms
(10 rows)

当数据库默认的lc_collate<>C时，还有一种方法让b-tree索引支持模糊查询。使用对应类型的pattern ops，使用pattern ops将使用字符的查询方式而非binary的搜索方式。

使用类型对应的pattern ops，索引搜索不仅支持LIKE的写法，还支持规则表达式的写法。

test2=# \c test1
You are now connected to database "test1" as user "postgres".
test1=# create index idx1 on test(reverse(info) text_pattern_ops);  
CREATE INDEX
test1=#  explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like 'baf1%';
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=199.68..7821.56 rows=5000 width=37) (actual time=0.138..0.210 rows=6 loops=1)
   Output: id, info
   Filter: (reverse(test.info) ~~ 'baf1%'::text)
   Heap Blocks: exact=6
   Buffers: shared hit=5 read=4
   ->  Bitmap Index Scan on idx1  (cost=0.00..198.43 rows=5000 width=0) (actual time=0.108..0.109 rows=6 loops=1)
         Index Cond: ((reverse(test.info) ~>=~ 'baf1'::text) AND (reverse(test.info) ~<~ 'baf2'::text))
         Buffers: shared read=3
 Planning Time: 24.781 ms
 Execution Time: 0.302 ms
(10 rows

前后模糊查询

pg_trgm

使用PostgreSQL内置的pg_trgm索引，可以支持前模糊、后模糊以及前后模糊的查询。但使用时有一些注意事项：

要让pg_trgm高效支持多字节字符(例如中文)，数据库lc_ctype不能为"C"
```
test2=# select show_trgm('数据库');
 show_trgm 
-----------
 {}
(1 row)
```
建议输入3个或3个以上字符，否则效果不佳
当匹配结果非常多时，即使有limit限制返回的行数，一样会有较大成本

pg_trgm使用

test1=# select show_trgm('123'); 
        show_trgm        
-------------------------
 {"  1"," 12",123,"23 "}
(1 row)

test1=# select show_trgm('数据库'); 
               show_trgm               
---------------------------------------
 {0x8a25e7,0x077ed5,0x2d07b4,0x727b48}
(1 row)

pg_trgm将字符串的前端添加2个空格，末尾添加1个空格。然后每连续的3个字符为一个TOKEN，最后对TOKEN建立GIN倒排索引。

使用pg_trgm时，如果要获得最好的效果，最好满足这些条件。

有前缀的模糊查询，例如a%，至少需要提供1个字符。( 搜索的是token=' a' )
有后缀的模糊查询，例如%ab，至少需要提供2个字符。( 搜索的是token='ab ' )
前后模糊查询，例如%abcd%，至少需要提供3个字符。

测试过程

生成随机中文字符串的函数

create or replace function gen_hanzi(int) returns text as $$                  
declare        
  res text;        
begin        
  if $1 >=1 then        
    select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);        
    return res;        
  end if;        
  return null;        
end;        
$$ language plpgsql strict;        
CREATE FUNCTION

验证pg_trgm索引

test1=# select * from test3 limit 5;
                    c1                    
------------------------------------------
 殞诉硼系鼲怂葻鴯魊兡緳襛奕犨璫訃褨啜椴镱
 粯繾譋觯嚯疏毙寂檃鎪耏俶可鶠胗镰蚲謴權犧
 鐈詬愇潠固洿梟槰隬饶巟渟桸吁戍衾獖稸黑横
 榟欘摬蝪桶犧鋫羰筰軩跜籔狤賶杣矚檲嬌樜鳴
 冇鍺撴杴犟哞鵞拃毕磉擋浧巟爒魐碘帇擭弋档
(5 rows)

test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where c1 like '%鎪耏俶%';
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test3  (cost=0.00..2387.00 rows=10 width=61) (actual time=0.024..71.642 rows=1 loops=1)
   Output: c1
   Filter: (test3.c1 ~~ '%鎪耏俶%'::text)
   Rows Removed by Filter: 99999
   Buffers: shared hit=1137
 Planning Time: 0.131 ms
 Execution Time: 71.675 ms
(7 rows)

test1=# create index idx_test001_1 on test3 using gin (c1 gin_trgm_ops);
CREATE INDEX
test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where c1 like '%鎪耏俶%';
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test3  (cost=20.08..57.39 rows=10 width=61) (actual time=0.046..0.047 rows=1 loops=1)
   Output: c1
   Recheck Cond: (test3.c1 ~~ '%鎪耏俶%'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..20.07 rows=10 width=0) (actual time=0.032..0.033 rows=1 loops=1)
         Index Cond: (test3.c1 ~~ '%鎪耏俶%'::text)
         Buffers: shared hit=4
 Planning Time: 0.442 ms
 Execution Time: 0.101 ms
(10 rows)

test1=# \di+ idx_test001_1
                            List of relations
 Schema |     Name      | Type  |  Owner   | Table | Size  | Description 
--------+---------------+-------+----------+-------+-------+-------------
 public | idx_test001_1 | index | postgres | test3 | 66 MB | 
(1 row)

pg_trgm索引不支持使用2字前后模糊查询

test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where c1 like '%鎪耏%';
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test3  (cost=0.00..2387.00 rows=10 width=61) (actual time=0.040..75.146 rows=1 loops=1)
   Output: c1
   Filter: (test3.c1 ~~ '%鎪耏%'::text)
   Rows Removed by Filter: 99999
   Buffers: shared hit=1137
 Planning Time: 0.363 ms
 Execution Time: 75.207 ms
(7 rows)

对于上面这种场景，可以使用表达式，将字符串拆成1个单字，两个连续的字符的数组。

 create or replace function split001(text) returns text[] as $$      
declare      
  res text[];      
begin      
  select regexp_split_to_array($1,'') into res;      
  for i in 1..length($1)-1 loop      
    res := array_append(res, substring($1,i,2));      
  end loop;      
  return res;      
end;      
$$ language plpgsql strict immutable;

test1=# select split001('数据库');
       split001       
----------------------
 {数,据,库,数据,据库}
(1 row)

创建表达式索引并验证性能，查询时同样需要表达式去匹配

test1=# create index idx_test3_2 on test3 using gin (split001(c1)); 
CREATE INDEX

test1=# \di+ idx_test3_2
                           List of relations
 Schema |    Name     | Type  |  Owner   | Table | Size  | Description 
--------+-------------+-------+----------+-------+-------+-------------
 public | idx_test3_2 | index | postgres | test3 | 94 MB |

test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where split001(c1) @> array['鎪耏'];
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test3  (cost=23.87..1056.51 rows=500 width=61) (actual time=0.105..0.107 rows=1 loops=1)
   Output: c1
   Recheck Cond: (split001(test3.c1) @> '{鎪耏}'::text[])
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_test3_2  (cost=0.00..23.75 rows=500 width=0) (actual time=0.098..0.098 rows=1 loops=1)
         Index Cond: (split001(test3.c1) @> '{鎪耏}'::text[])
         Buffers: shared hit=4
 Planning Time: 0.316 ms
 Execution Time: 0.145 ms
(10 rows)

pg_bigm

pg_bigm同开源社区的一款插件，并没有集成到PostgreSQL中，它同样支模糊查询和相似查询，对比pg_trgm，它有以下优势：

支持高性能1或2个字的模糊和相似搜索

任何lc_ctype的数据库都能支持wchar切词

test2=# \l test2
                         List of databases
 Name  |  Owner   | Encoding | Collate | Ctype | Access privileges 
-------+----------+----------+---------+-------+-------------------
 test2 | postgres | UTF8     | C       | C     | 
(1 row)

test2=# select show_bigm('数据库');
        show_bigm        
-------------------------
 {"库 ",据库,数据," 数"}
(1 row)

test2=# select show_trgm('数据库');
 show_trgm 
-----------
 {}
(1 row)

安装插件

下载地址：

https://pgbigm.osdn.jp/index_en.html

安装

[postgres@yejf ~]$ tar -zxvf pg_bigm-1.2-20200228.tar.gz 
[postgres@yejf ~]$ cd pg_bigm-1.2-20200228
[postgres@yejf pg_bigm-1.2-20200228]$ make USE_PGXS=1
[postgres@yejf pg_bigm-1.2-20200228]$ make USE_PGXS=1 install

不需要重启数据库，登录到要使用的数据库，创建插件

[postgres@yejf ~]$ psql temp
psql (12.4)
Type "help" for help.

temp=# create extension pg_bigm ;
CREATE EXTENSION

pg_bigm使用

跟pg_trgm一样

test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where c1 like '%鎪耏%';
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test3  (cost=0.00..2387.06 rows=10 width=61) (actual time=0.028..31.377 rows=1 loops=1)
   Output: c1
   Filter: (test3.c1 ~~ '%鎪耏%'::text)
   Rows Removed by Filter: 100004
   Buffers: shared hit=1137
 Planning Time: 0.109 ms
 Execution Time: 31.399 ms
(7 rows)


test1=# CREATE INDEX idx_test3 ON test3 USING gin(c1 gin_bigm_ops);
CREATE INDEX
test1=# explain (analyze,verbose,timing,costs,buffers) select * from test3 where c1 like '%鎪耏%';
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test3  (cost=20.08..57.39 rows=10 width=61) (actual time=0.057..0.059 rows=1 loops=1)
   Output: c1
   Recheck Cond: (test3.c1 ~~ '%鎪耏%'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_test3  (cost=0.00..20.08 rows=10 width=0) (actual time=0.043..0.043 rows=1 loops=1)
         Index Cond: (test3.c1 ~~ '%鎪耏%'::text)
         Buffers: shared hit=4
 Planning Time: 1.568 ms
 Execution Time: 0.181 ms
(10 rows)

pg_bigm官方使用文档

http://pgbigm.osdn.jp/pg_bigm_en-1-2.html

pg_trgm和pg_bigm区别

pg_trgm连续三个字切分

temp=# create extension pg_trgm ;
temp=# select show_trgm('yejianfeng'); 
                      show_trgm                      
-----------------------------------------------------
 {"  y"," ye",anf,eji,eng,fen,ian,jia,nfe,"ng ",yej}

pg_bigm使用连续两个字切分

temp=# select show_bigm('yejianfeng'); 
               show_bigm                
----------------------------------------
 {" y",an,ej,en,fe,"g ",ia,ji,nf,ng,ye}

具体区别如下：

功能与特性	pg_trgm	pg_bigm
切词粒度	前加2后加1个空格, 每3个连续字	前加2后加1个空格, 每2个连续字
索引接口	gin, gist	gin
支持的语法	`LIKE (~~), ILIKE (~~), ~, ~`	LIKE only
wchar支持情况	lc_ctype <> C 或者编译时注释 contrib/pg_trgm/trgm.h KEEPONLYALNUM macro定义, 否则不支持切分wchar的token	任何时候都支持wchar
1或2个字的模糊查询	慢(因为切词粒度为3个字, 所以无法匹配), 必须全表扫描, 或者full index scan+RECHECK	快
相似查询	支持	支持
like封装函数	不支持	支持 (likequery 函数自动将%_\进行转意，不需要人为转意)
最大索引字段长度	228MB	102MB
区分大小写	不区分	区分, like时匹配的记录区分大小写。
高性能开关(关闭recheck, 不严谨查询)	不支持	支持 pg_bigm.enable_recheck=off
高性能开关(粗查, 不严谨查询)	不支持	支持 pg_bigm.gin_key_limit 只查少量token (2-grams)

相似查询

官方文档：https://www.postgresql.org/docs/current/pgtrgm.html

pg_trgm插件的gist索引能够支持相似查询，对于中文，同样的数据库lc_ctype不能为"C" 。

根据相似度查询，c1 <-> '数据库' AS dist
的值越靠近0表示相似度越高，并且主要用于排序的性能提升，如果用于条件中筛选相似度的话，会查询所有索引的信息，即索引无用了。

相似查询语法<->
的语法需要激活插件才能查询生效

test1=# select t,c1 <-> '数据库' AS dist  from test3 t order by  (c1 <-> '数据库') limit 7;
ERROR:  operator does not exist: character varying <-> unknown
LINE 1: select t,c1 <-> '数据库' AS dist  from test3 t order by  (c1...
                    ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

test1=# create extension pg_trgm;
CREATE EXTENSION


test1=# select t,c1 <-> '数据库' AS dist  from test3 t order by  (c1 <-> '数据库') limit 7;
                     t                      |    dist    
--------------------------------------------+------------
 (数据库)                                   |          0
 (数据库2)                                  |        0.5
 (数据库1)                                  |        0.5
 (数1据库)                                  | 0.71428573
 (数据2库)                                  | 0.71428573
 (数蜋賵鏉慫癥饚暲哪軘罫抒幽焒种塤塮丱兓偈) |  0.9583333
 (数鳡韢謭蠨饩賋韸籓霵墿埬顾拯悍訊逄妞狶娃) |  0.9583333

没索引时的性能

test1=# explain (analyze,verbose,timing,costs,buffers) select t,c1 <-> '数据库' AS dist  from test3 t order by  (c1 <-> '数据库') limit 7;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3992.20..3993.00 rows=7 width=89) (actual time=721.071..723.242 rows=7 loops=1)
   Output: t.*, ((c1 <-> '数据库'::text))
   Buffers: shared hit=1184
   ->  Gather Merge  (cost=3992.20..10757.19 rows=58826 width=89) (actual time=721.068..723.236 rows=7 loops=1)
         Output: t.*, ((c1 <-> '数据库'::text))
         Workers Planned: 1
         Workers Launched: 1
         Buffers: shared hit=1184
         ->  Sort  (cost=2992.19..3139.25 rows=58826 width=89) (actual time=714.801..714.804 rows=7 loops=2)
               Output: t.*, ((c1 <-> '数据库'::text))
               Sort Key: ((t.c1 <-> '数据库'::text))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Buffers: shared hit=1184
               Worker 0: actual time=709.502..709.506 rows=7 loops=1
                 Buffers: shared hit=564
               ->  Parallel Seq Scan on public.test3 t  (cost=0.00..1872.33 rows=58826 width=89) (actual time=0.175..685.135 rows=50002 loops=2)
                     Output: t.*, (c1 <-> '数据库'::text)
                     Buffers: shared hit=1137
                     Worker 0: actual time=0.267..679.198 rows=45496 loops=1
                       Buffers: shared hit=517
 Planning Time: 0.146 ms
 Execution Time: 723.303 ms
(23 rows)

创建gist索引，性能提升明显

test1=# CREATE INDEX idx_test3_1 ON test3 USING gist(c1 gist_trgm_ops);
CREATE INDEX
test1=# explain (analyze,verbose,timing,costs,buffers) select t,c1 <-> '数据库' AS dist  from test3 t order by  (c1 <-> '数据库') limit 7;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..1.20 rows=7 width=89) (actual time=44.262..67.599 rows=7 loops=1)
   Output: t.*, ((c1 <-> '数据库'::text))
   Buffers: shared hit=1640
   ->  Index Scan using idx_test3_1 on public.test3 t  (cost=0.28..13096.38 rows=100005 width=89) (actual time=44.259..67.589 rows=7 loops=1)
         Output: t.*, (c1 <-> '数据库'::text)
         Order By: (t.c1 <-> '数据库'::text)
         Buffers: shared hit=1640
 Planning Time: 0.442 ms
 Execution Time: 67.704 ms
(9 rows)

创建索引语句

b-tree索引

create index idx on test(info);

gin索引

create index idx_test001_1 on test3 using gin (c1 gin_trgm_ops);

gist索引

CREATE INDEX idx_test3_1 ON test3 USING gist(c1 gist_trgm_ops);

表达式索引

create index idx1 on test(reverse(info));
create index idx_test3_2 on test3 using gin (split001(c1));

另外还有一款插件pgroonga
, 支持更多的操作符, 整合了多种类型的模糊查询, 包括json, 数组, 数值, 文本等。

https://pgroonga.github.io/tutorial/

文章转载自叶同学专栏，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。