暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

华为openGauss Synonym词典

华为高斯 2020-06-01
1076

Synonym词典用于定义、识别token的同义词并转化,不支持词组(词组形式的同义词可用Thesaurus词典定义,详细请参见Thesaurus词典)。

示例

  • Synonym词典可用于解决语言学相关问题,例如,为避免使单词"Paris"变成"pari",可在Synonym词典文件中定义一行"Paris paris",并将该词典放置在预定义的english_stem词典之前。

    ``` postgres=# SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+----------------+--------------+--------- asciiword | Word, all ASCII | Paris | {english_stem} | english_stem | {pari} (1 row)

    postgres=# CREATE TEXT SEARCH DICTIONARY my_synonym ( TEMPLATE = synonym, SYNONYMS = my_synonyms, FILEPATH = 'file:///home/dicts/' );

    postgres=# ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH my_synonym, english_stem;

    postgres=# SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row)

    postgres=# SELECT * FROM ts_debug('english', 'paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row)

    postgres=# ALTER TEXT SEARCH DICTIONARY my_synonym ( CASESENSITIVE=true);

    postgres=# SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {paris} (1 row)

    postgres=# SELECT * FROM ts_debug('english', 'paris'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------+---------------------------+------------+--------- asciiword | Word, all ASCII | Paris | {my_synonym,english_stem} | my_synonym | {pari} (1 row)

    ```

    其中,同义词词典文件全名为my_synonyms.syn,所在目录为当前连接数据库主节点的/home/dicts/下。关于创建词典的语法和更多参数,请参见CREATE TEXT SEARCH DICTIONARY

  • 星号(*)可用于词典文件中的同义词结尾,表示该同义词是一个前缀。在to_tsvector()中该星号将被忽略,但在to_tsquery()中会匹配该前缀并对应输出结果(参照处理查询一节)。

    假设词典文件synonym_sample.syn内容如下:

    postgres pgsql postgresql pgsql postgre pgsql gogle googl indices index*

    创建并使用词典:

    ``` postgres=# CREATE TEXT SEARCH DICTIONARY syn ( TEMPLATE = synonym, SYNONYMS = synonym_sample );

    postgres=# SELECT ts_lexize('syn','indices'); ts_lexize


    {index} (1 row)

    postgres=# CREATE TEXT SEARCH CONFIGURATION tst (copy=simple);

    postgres=# ALTER TEXT SEARCH CONFIGURATION tst ALTER MAPPING FOR asciiword WITH syn;

    postgres=# SELECT to_tsvector('tst','indices'); to_tsvector


    'index':1 (1 row)

    postgres=# SELECT to_tsquery('tst','indices'); to_tsquery


    'index':* (1 row)

    postgres=# SELECT 'indexes are very useful'::tsvector; tsvector


    'are' 'indexes' 'useful' 'very' (1 row)

    postgres=# SELECT 'indexes are very useful'::tsvector @@ to_tsquery('tst','indices'); ?column?


    t (1 row) ```

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

文集目录
暂无数据