openGauss全文检索 功能
两种数据类型用于支持全文检索。tsvector类型表示为文本搜索优化的文件格式,tsquery类型表示文本查询.
openGauss的全文检索基于匹配算子@@,当一个tsvector(document)匹配到一个tsquery(query)时,则返回true。
其中,tsvector(document)和tsquery(query)两种数据类型可以任意排序。
官网网址 https://opengauss.org/zh/docs/2.1.0/docs/Developerguide/%E5%9F%BA%E6%9C%AC%E6%96%87%E6%9C%AC%E5%8C%B9%E9%85%8D.html
分词器
全文检索功能还可以做更多事情:忽略索引某个词(停用词),处理同义词和使用复杂解析,例如:不仅基于空格的解析。这些功能通过文本搜索分词器控制。openGauss支持多语言的预定义的分词器,并且可以创建分词器(gsql的\dF命令显示了所有可用分词器)。
在安装期间选择一个合适的分词器,并且在postgresql.conf中相应的设置default_text_search_config。如果为了openGauss使用同一个文本搜索分词器可以使用postgresql.conf中的值。如果需要在openGauss中使用不同分词器,可以使用ALTER DATABASE … SET在任一数据库进行配置。用户也可以在每个会话中设置default_text_search_config。
每个依赖于分词器的文本搜索函数有一个可选的配置参数,用以明确声明所使用的分词器。仅当忽略这个参数的时候,才使用default_text_search_config。
为了更方便的建立自定义文本搜索分词器,可以通过简单的数据库对象建立分词器。 openGauss文本搜索功能提供了四种类型与分词器相关的数据库对象:
文本搜索解析器将文档分解为token,并且分类每个token(例如:词和数字)。
文本搜索词典将token转换成规范格式并且丢弃停用词。
文本搜索模板提供潜在的词典功能:一个词典指定一个模板,并且为模板设置参数。
文本搜索分词器选择一个解析器,并且使用一系列词典规范化语法分析器产生的token。
官网网址 https://opengauss.org/zh/docs/2.1.0/docs/Developerguide/%E5%88%86%E8%AF%8D%E5%99%A8.html
openGauss全文检索 练习
用tsvector @@ tsquery和tsquery @@ tsvector完成两个基本文本匹配
omm=# SELECT 'a fat cat TOM '::tsvector @@ 'cat & TOM'::tsquery AS RESULT;
result
--------
t
(1 row)
omm=# SELECT 'fat & JON'::tsquery @@ 'a fat cat TOM'::tsvector AS RESULT;
omm=# result
--------
f
(1 row)
创建表且至少有两个字段的类型为 text类型,在创建索引前进行全文检索
omm=# CREATE TABLE TXT(id int, body text, title text, last_mod_date date);
CREATE TABLE
omm=# omm=#
omm=#
omm=# INSERT INTO TXT VALUES(1, 'China, officially the People''s Republic of China(PRC), located in Asia, is the world''s most populous state.', 'China', '2010-1-1');
INSERT 0 1
omm=# INSERT INTO TXT VALUES(2, 'America is a rock band, formed in England in 1970 by multi-instrumentalists Dewey Bunnell, Dan Peek, and Gerry Beckley.', 'America', '2010-1-1');
INSERT 0 1
omm=# INSERT INTO TXT VALUES(3, 'England is a country that is part of the United Kingdom. It shares land borders with Scotland to the north and Wales to the west.', 'England','2010-1-1');
INSERT 0 1
omm=# select * from TXT;
id | body
| title | last_mod_date
----+---------------------------------------------------------------------------------------
--------------------------------------------+---------+---------------
1 | China, officially the People's Republic of China(PRC), located in Asia, is the world's
most populous state. | China | 2010-01-01
2 | America is a rock band, formed in England in 1970 by multi-instrumentalists Dewey Bunn
ell, Dan Peek, and Gerry Beckley. | America | 2010-01-01
3 | England is a country that is part of the United Kingdom. It shares land borders with S
cotland to the north and Wales to the west. | England | 2010-01-01
(3 rows)
omm=# SELECT id, body, title FROM TXT WHERE to_tsvector(body) @@ to_tsquery('China');
----------------------+-------
1 | China, officially the People's Republic of China(PRC), located in Asia, is the world's
most populous state. | China
(1 row)
id | body
| title
----+---------
omm=# SELECT id, body, title FROM TXT WHERE to_tsvector(body) @@ to_tsquery('America');
id | body
| title
----+---------------------------------------------------------------------------------------
----------------------------------+---------
2 | America is a rock band, formed in England in 1970 by multi-instrumentalists Dewey Bunn
ell, Dan Peek, and Gerry Beckley. | America
(1 row)
创建GIN索引
omm=# CREATE INDEX TXT_1 ON TXT USING gin(to_tsvector('english', body));
omm=# CREATE INDEX
omm=# CREATE INDEX TXT_2 ON TXT USING gin(to_tsvector('english', title || ' ' || body));
CREATE INDEX
omm=# \d+ TXT
Table "public.txt"
Column | Type | Modifiers | Storage | Stats target | Description
---------------+---------+-----------+----------+--------------+-------------
id | integer | | plain | |
body | text | | extended | |
title | text | | extended | |
last_mod_date | date | | plain | |
Indexes:
"txt_1" gin (to_tsvector('english'::regconfig, body)) TABLESPACE pg_default
"txt_2" gin (to_tsvector('english'::regconfig, (title || ' '::text) || body)) TABLESPACE pg_default
Has OIDs: no
Options: orientation=row, compression=no
omm=# SELECT id, body, title FROM TXT WHERE to_tsvector(body) @@ to_tsquery('America');
2 | America is a rock band, formed in England in 1970 by multi-instrumentalists Dewey Bunn
ell, Dan Peek, and Gerry Beckley. | America
(1 row)
id | body
| title
----+---------------------------------------------------------------------------------------
----------------------------------+---------
清理数据
omm=#drop table txt;
DROP TABLE
omm=#
持续打卡中




