Spring Boot 2 集成 Elasticsearch 8.1.2 实现文件搜索

奇文社区 2022-04-13

1910

“ 奇文网盘很早之前的版本就已经增加了文件搜索功能，而最近网盘也直接将Elasticsearch的版本适配到了最新的8.1.2，因为网上资料并不多，自己也是摸索了好几天才搞出来，本篇文章主要介绍Spring Boot 集成 Elasticsearch 8.1.2 的详细过程，并介绍其最基本使用方式。”

根据官方文档的介绍，使用 8.1 版，可以以更快的索引速度节省空间。通过仅包含文档值的字段，在搜索性能上可实现更快的索引速度（升幅 20%），更低的存储空间（降幅 20%）。

—

Elasticsearch下载和配置

下载地址：https://www.elastic.co/cn/downloads/elasticsearch

下载完解压缩之后，如下为ES目录结构

启动脚本存放在bin，配置文件存放在config

之前写过一篇关于安装与配置的文章，大家可以参考，这里就不再赘述：https://www.qiwenshare.com/essay/detail/233

—

Maven 依赖

虽然Spring Boot也提供了 Elasticsearch 的启动类，但是它的版本还是对ES7的适配，不适用于ES8，使用的过程中会存在各种问题，且后台报警一大堆，因此这里推荐使用官方提供的对应版本依赖，如下：

<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.1.2</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
    <groupId>jakarta.json</groupId>
    <artifactId>jakarta.json-api</artifactId>
    <version>2.0.1</version>
</dependency>

—

连接配置

@Configuration
public class ElasticSearchConfig {


    @Bean
    public ElasticsearchClient elasticsearchClient(){
        RestClient client = RestClient.builder(new HttpHost("localhost", 9200,"http")).build();
        ElasticsearchTransport transport = new RestClientTransport(client,new JacksonJsonpMapper());
        return new ElasticsearchClient(transport);
    }
}

—

创建客户端对象

@Autowired
private ElasticsearchClient client;

—

索引 CRUD

新增 index

//写法比RestHighLevelClient更加简洁
CreateIndexResponse indexResponse = client.indices().create(c -> c.index("filesearch"));

删除 index

DeleteIndexResponse deleteIndexResponse = client.indices().delete(d -> d.index("filesearch"));

查询 index

GetIndexResponse getIndexResponse = client.indices().get(i -> i.index("filesearch"));

判断 index 是否存在

BooleanResponse booleanResponse = client.indices().exists(e -> e.index("filesearch"));
System.out.println(booleanResponse.value());

—

Document CRUD

因为我们要对文件进行搜素，索引需要新增一个实体类 FileSearch.java, 里面添加一些属性，如下：

@Data
@JsonIgnoreProperties(ignoreUnknown = true)
public class FileSearch {
    private Long userFileId;
    private String fileName;
    private Long userId;
}

插入 document

FileSearch fileSearch = new FileSearch();
fileSearch.setUserFileId(2L);
fileSearch.setFileName("文件名");
client.index(i -> i.index("filesearch").id(String.valueOf(fileSearch.getUserFileId())).document(fileSearch));

批量插入 document

List<FileSearch> fileSearchList = new ArrayList<>();
List<BulkOperation> bulkOperationArrayList = new ArrayList<>();
//遍历添加到bulk中
for(FileSearch fileSearch : fileSearchList){
    bulkOperationArrayList.add(BulkOperation.of(o->o.index(i->i.document(fileSearch))));
}


BulkResponse bulkResponse = client.bulk(b -> b.index("filesearch")
        .operations(bulkOperationArrayList));

更新 document

FileSearch fileSearch = new FileSearch();
fileSearch.setUserFileId(2L);
fileSearch.setFileName("文件名");
UpdateResponse<FileSearch> updateResponse = client.update(u -> u
                .index("filesearch")
                .id("1")
                .doc(fileSearch)
        , FileSearch.class);

删除 document

elasticsearchClient.delete(d -> d
                        .index("filesearch")
                        .id(String.valueOf(userFileId)));

通过 id 查询 document

GetResponse<FileSearch> getResponse = client.get(g -> g
                        .index("filesearch")
                        .id("1")
                , FileSearch.class
        );

—

文件搜索

我们先从这个简单例子入手，给大家讲解

SearchResponse<FileSearch> search = client.search(s -> s
    .index("filesearch")
    .query(q -> q
        .term(t -> t
            .field("fileName")
            .value(v -> v.stringValue("aaa"))
        )),
    FileSearch.class);

上面这段代码是一段最简单的搜索案例，使用client.search()方法可以通过条件搜索出结果集，其中index指定索引空间名称为“filesearch”, 在这个索引空间中查询 fileName="aaa" 条件的数据，并返回。

term是最简单，最常用的检索方式，还有其他几种，比如match,wildcard等，他们之间有什么区别呢，如下：

term：不分词检索，把检索串当作一个整体来执行检索, 即不会对检索串分词.

match：分词检索，match的所有方法，都会对字段进行分词，所查询的字段数据只要包含分词后结果的一个，就会被查询到

wildcard：通配符检索，通配符查询是一种底层基于词的查询，它允许指定匹配的正则表达式。而且它使用的是标准的 shell 通配符查询：

? 匹配任意字符
* 匹配 0 个或多个字符

接下来我们来实现更高级的搜索，上面的搜索使用的term必须完全匹配才能够搜索到，接下来我们改查分词检索 match

SearchResponse<FileSearch> search = client.search(s -> s
                        .index("filesearch")
                        .query(q -> q
                                .match(t -> t
                                        .field("fileName")
                                        .query("aaa")
                                )),
                FileSearch.class);

分词检索有什么好处呢，假如一个文件名为“搜索查询测试.txt”的文件，我们输入其中任何几个汉字的排列组合，也是可以搜索出来的。

但是这种做法对英文是不怎么友好的，假设文件名为“courgette asdf.log” 的文件，输入cour 搜索不出来，必须搜索完整的courgette 才能搜索出来，因为英文的分词是对整个单词而言的，因为中间有空格，所以ES认为这是两个单词，所以我们可以尝试在上面的基础上增加通配符检索wildcard，如下：

SearchResponse<FileSearch> search = client.search(s -> s
                        .index("filesearch")
                        .query(q -> q.bool(_1 -> _1
                                .should(_2 -> _2
                                        .match(_3 -> _3
                                                .field("fileName")
                                                .query("cour")))
                                .should(_2 -> _2
                                        .wildcard(_3 -> _3
                                                .field("fileName")
                                                .wildcard("*" + "cour" + "*"))))
                        ),
                FileSearch.class);

上面用到了更高级的bool查询，bool查询的规则如下：

must：与关系，相当于关系型数据库中的 and。

should：或关系，相当于关系型数据库中的 or。

must_not：非关系，相当于关系型数据库中的 not。

filter：过滤条件。

range：条件筛选范围。

gt：大于，相当于关系型数据库中的 >。

gte：大于等于，相当于关系型数据库中的 >=。

lt：小于，相当于关系型数据库中的 <。

lte：小于等于，相当于关系型数据库中的 <=

看完有没有感觉很像数据库查询呢？大家可以类比的去学习，可能会更容易理解。

接下来我们继续新增条件，在文件管理系统中，我们查询都是区分用户的，因此我们需要在上面的条件中，把用户条件加上，如下：

 search = elasticsearchClient.search(s -> s
                            .index("filesearch")
                            .query(_1 -> _1
                                    .bool(_2 -> _2
                                            .must(_3 -> _3
                                                    .bool(_4 -> _4
                                                            .should(_5 -> _5
                                                                    .match(_6 -> _6
                                                                            .field("fileName")
                                                                            .query(searchFileDTO.getFileName())))
                                                            .should(_5 -> _5
                                                                    .wildcard(_6 -> _6
                                                                            .field("fileName")
                                                                            .wildcard("*" + searchFileDTO.getFileName() + "*")))))
                                            .must(_3 -> _3
                                                    .term(_4 -> _4
                                                            .field("userId")
                                                            .value(sessionUserBean.getUserId())))
                                    ))
                            ,
                    FileSearch.class);

增加分页

使用from() 和 size()指定分页参数

 search = elasticsearchClient.search(s -> s
                            .index("filesearch")
                            .query(_1 -> _1
                                    .bool(_2 -> _2
                                            .must(_3 -> _3
                                                    .bool(_4 -> _4
                                                            .should(_5 -> _5
                                                                    .match(_6 -> _6
                                                                            .field("fileName")
                                                                            .query(searchFileDTO.getFileName())))
                                                            .should(_5 -> _5
                                                                    .wildcard(_6 -> _6
                                                                            .field("fileName")
                                                                            .wildcard("*" + searchFileDTO.getFileName() + "*")))))
                                            .must(_3 -> _3
                                                    .term(_4 -> _4
                                                            .field("userId")
                                                            .value(sessionUserBean.getUserId())))
                                    ))
                                    .from(currentPage)
                                    .size(pageCount)
                            ,
                    FileSearch.class);

查询结果命中关键词高亮显示

 search = elasticsearchClient.search(s -> s
                            .index("filesearch")
                            .query(_1 -> _1
                                    .bool(_2 -> _2
                                            .must(_3 -> _3
                                                    .bool(_4 -> _4
                                                            .should(_5 -> _5
                                                                    .match(_6 -> _6
                                                                            .field("fileName")
                                                                            .query(searchFileDTO.getFileName())))
                                                            .should(_5 -> _5
                                                                    .wildcard(_6 -> _6
                                                                            .field("fileName")
                                                                            .wildcard("*" + searchFileDTO.getFileName() + "*")))))
                                            .must(_3 -> _3
                                                    .term(_4 -> _4
                                                            .field("userId")
                                                            .value(sessionUserBean.getUserId())))
                                    ))
                                    .from(currentPage)
                                    .size(pageCount)
                                    .highlight(h -> h
                                        .fields("fileName", f -> f.type("plain")
                                                .preTags("<span class='keyword'>").postTags("</span>"))
                                        .encoder(HighlighterEncoder.Html))
                            ,
                    FileSearch.class);

—

结束语

如果你认真的看到这里，想必你已经感受到了ES的强大之处，但是本篇文章也并不能以偏概全，仅仅只是作为一个入门，其中 lambda 表达式非常简洁，更加的接近原语，但是如果对这种语法不熟悉，理解起来还是有一定困难的，所以还是要多加练习。

数据库 elasticsearch

文章转载自奇文社区，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Spring Boot 2 集成 Elasticsearch 8.1.2 实现文件搜索

评论