学习笔记
tsvector、tsquery
tsvector 拥有位置的词汇甚至可以用一个权来标记,反映文档结构,这个权可以是A,B,C或D。默认的是D,因此输出中不会出现
tsvector
----------------------------
'a':1A 'cat':5 'fat':2B,4C
(1 row)
omm=# SELECT 'fat & rat'::tsquery;
tsquery
---------------
'fat' & 'rat'
(1 row)
omm=# SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery AS RESULT;
result
--------
t
(1 row)
omm=# SELECT 'fat & cow'::tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector AS RESULT;
result
--------
f
(1 row)
分词器
omm=# \dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | ngram | ngram configuration
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | pound | pound configuration
pg_catalog | zhparser | zhparser configuration
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
(19 rows)
omm=#
omm=# show default_text_search_config;
default_text_search_config
----------------------------
pg_catalog.english
(1 row)
GIN索引
连接列的索引
omm=# CREATE INDEX pgweb_idx_1 ON tsearch.pgweb USING gin(to_tsvector(‘english’, body));
CREATE INDEX
omm=# CREATE INDEX pgweb_idx_3 ON tsearch.pgweb USING gin(to_tsvector(‘english’, title || ’ ’ || body));
CREATE INDEX
omm=# \d+ tsearch.pgweb
Table “tsearch.pgweb”
Column | Type | Modifiers | Storage | Stats target | Description
---------------±--------±----------±---------±-------------±------------
id | integer | | plain | |
body | text | | extended | |
title | text | | extended | |
last_mod_date | date | | plain | |
Indexes:
“pgweb_idx_1” gin (to_tsvector(‘english’::regconfig, body)) TABLESPACE pg_default
“pgweb_idx_3” gin (to_tsvector(‘english’::regconfig, (title || ’ '::text) || body)) TABLESPACE pg_default
Has OIDs: no
Options: orientation=row, compression=no
课后作业
1.用tsvector @@ tsquery和tsquery @@ tsvector完成两个基本文本匹配
omm=# SELECT 'good good study day day up'::tsvector;
tsvector
---------------------------
'day' 'good' 'study' 'up'
(1 row)
omm=# SELECT 'you & live'::tsquery;
tsquery
----------------
'you' & 'live'
(1 row)
omm=# SELECT 'good good study day day up'::tsvector @@ 'study & day'::tsquery AS RESULT;
result
--------
t
(1 row)
omm=# SELECT 'you & live'::tsquery @@ 'Cease to struggle and you cease to live'::tsvector AS RESULT;
result
--------
f
(1 row)
2.创建表且至少有两个字段的类型为 text类型,在创建索引前进行全文检索
omm=# create table c
omm-# (
omm(# id integer,
omm(# content1 text,
omm(# content2 text
omm(# );
CREATE TABLE
omm=# insert into c values
omm-# (1,‘good good study day day up’,‘hello’),
omm-# (2,‘Cease to struggle and you cease to live’,‘world’);
INSERT 0 2
omm=# select * from c;
id | content1 | content2
----±----------------------------------------±---------
1 | good good study day day up | hello
2 | Cease to struggle and you cease to live | world
(2 rows)
omm=# select * from c where to_tsvector(content1) @@ to_tsquery(‘good’);
id | content1 | content2
----±---------------------------±---------
1 | good good study day day up | hello
(1 row)
omm=# select * from c where to_tsvector(content1 || ’ ’ || content2) @@ to_tsquery(‘cease & world’);
id | content1 | content2
----±----------------------------------------±---------
2 | Cease to struggle and you cease to live | world
(1 row)
3.创建GIN索引
omm=# create index content1_idx on c using gin(to_tsvector(‘english’, content1));
CREATE INDEX
omm=# \d c
Table “public.c”
Column | Type | Modifiers
----------±--------±----------
id | integer |
content1 | text |
content2 | text |
Indexes:
“content1_idx” gin (to_tsvector(‘english’::regconfig, content1)) TABLESPACE pg_default
4.清理数据
omm=# drop table c;
DROP TABLE




