03_Elasticsearch操作_CRUD与批量操作

lin在路上 2020-05-05

239

前两章分别准备了Elasticsearch环境及索引文档的概念。本章将介绍下文档的基本操作CRUD（增查改删）以及批量操作(每次CRUD操作都需要调用API，而批量操作将实现调用一次API进行多次CRUD操作，从而减少资源开销)

文档的CRUD

	说明	kibana命令
Index	文档不存在，则索引新的文档否则删除现有文档，并索引新文档，版本信息+1	PUT my_index/_doc/1{"user":"mike","commet":"hello world"}
Create	索引文档，支持PUT和POST两种方式PUT:需要指定ID，若ID重复将报错POST:不需要指定ID，将自动生成ID	PUT my_index/_create/1/{"user":"mike","comment":"hello world"} POST my_index/_doc(不指定ID，自动生成){"user":"mike","comment":"hello world"}
Read	通过ID获取文档信息找到文档,返回HTTP 200找不到文档,返回HTTP 404	GET my_index/_doc/1
Update	Updata不会删除原来的文档,而是实现真正的数据更新	POST my_index/_update/1{"doc":{"user":"mike","comment":"hello es"}}
Delete	删除指定文档	DELETE my_index/_doc/1

备注:type默认使用_doc.这是为了后续兼容Elasticseach8.0;根据计划6.0之前支持多个type，6.0开始支持一个type，8.0之后将正式废除(统一为_doc).为了过渡，在7.0中增加include_type_name参数(默认为true)，让所有的API是type相关的，而8.0之后该参数将默认改为false，也就是不包含type信息了，这个是用于移除type的一个开关

INDEX

文档不存在，则索引新的文档否则删除现有文档，并索引新文档，版本信息+1

若文档不存在，则索引新的文档

 PUT my_index/_doc/1
 {
   "name":"张三"
 }
 
 #result返回created表示创建成功;并返回index(索引名称)、type、id、version等信息
 {
   "_index" : "my_index"，
   "_type" : "_doc"，
   "_id" : "1"，
   "_version" : 1，
   "result" : "created"，
   "_shards" : {
     "total" : 2，
     "successful" : 1，
     "failed" : 0
   }，
   "_seq_no" : 0，
   "_primary_term" : 1
 }

若文档已存在则删除现有文档，并索引新的文档，版本信息+1。

 PUT my_index/_doc/1
 {
   "xb":"男"
 }
 
 #此时提示的result是updated；而不是第一次执行时的created,且版本数（_version）也+1
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 2,
   "result" : "updated",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 1,
   "_primary_term" : 1
 }

此时通过GET命令查询id为1的记录

 GET my_index/_doc/1
 
 #文档中保存的是第二次输入的“xb”:"男"，而不是第一次输入的“name”:"张三"，这是因为PUT命令虽然提示updated，但不是对原文档的修改，而是直接删掉原文档并新建文档。
 {
   "_index" : "my_index"，
   "_type" : "_doc"，
   "_id" : "1"，
   "_version" : 2，
   "_seq_no" : 1，
   "_primary_term" : 1，
   "found" : true，
   "_source" : {
     "xb" : "男"
   }
 }

CREATE

索引文档，支持PUT和POST两种方式PUT:需要指定ID，若ID重复将报错POST:不需要指定ID，将自动生成ID

使用PUT命令创建已经存在的文档my_inde/_create/1，将会报错

备注：索引名称后面跟的是方法_create，而非type(_doc)，后者属于Index

 PUT my_index/_create/1
 {
   "name":"李四"
 }
 
 #由于id为1的文档在已经使用Index语句创建，而使用create指定id为1时将报错
 {
   "error": {
     "root_cause": [
       {
         "type": "version_conflict_engine_exception",
         "reason": "[1]: version conflict, document already exists (current version [2])",
         "index_uuid": "huioLo83QmCdtFickQ9S5A",
         "shard": "0",
         "index": "my_index"
       }
     ],
     "type": "version_conflict_engine_exception",
     "reason": "[1]: version conflict, document already exists (current version [2])",
     "index_uuid": "huioLo83QmCdtFickQ9S5A",
     "shard": "0",
     "index": "my_index"
   },
   "status": 409
 }

而指定的id不存在，则将成功创建

 PUT my_index/_create/2
 {
   "name":"李四"
 }
 
 #id为2目前未创建，返回创建成功的提示
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "2",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 2,
   "_primary_term" : 1
 }

无论是Index还是Create，只要使用PUT都需要指定文档id。但是部分情况下id不需要指定，这个时候可用使用POST命令来实现

  POST my_index/_doc
 {
   "name":"王五"
 }
 #提示创建成功，并返回一个随机的id
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "76yFonEBuoXto2dXCuRb",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 3,
   "_primary_term" : 1
 }

使用POST再次创建文档

 POST my_index/_doc
 {
   "name":"马六"
 }
 #创建成功，且返回不一样的id
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "8KyFonEBuoXto2dXzeQN",
   "_version" : 1,
   "result" : "created",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 4,
   "_primary_term" : 1
 }

READ

READ是Elasticsearch中使用最频繁的场景，Elasticsearch后续的操作也主要是针对该环节的拓展。本章节主要说下最简单的指定id文档获取及前10条文档获取

指定索引、type、id获取指定文档内容

 GET my_index/_doc/1
 
 #查询成功将返回如下信息，其中_source中保存的是输入的所有字段信息
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 2,
   "_seq_no" : 1,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
     "xb" : "男"
   }
 }

若输入的id不存在对应的文档

 GET my_index/_doc/100
 
 #若文档不存在，则返回如下错误信息
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "100",
   "found" : false
 }

不指定具体文档id，使用_search方法将获取前10条文档

 GET my_index_1/_doc/_search
 
 #默认获取10条，这边由于数量限制只显示4条
 {
   "took" : 0,
   "timed_out" : false,
   "_shards" : {
     "total" : 1,
     "successful" : 1,
     "skipped" : 0,
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 4,
       "relation" : "eq"
     },
     "max_score" : 1.0,
     "hits" : [
       {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "1",
         "_score" : 1.0,
         "_source" : {
           "xb" : "男"
         }
       },
       {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "2",
         "_score" : 1.0,
         "_source" : {
           "name" : "李四"
         }
       },
       {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "76yFonEBuoXto2dXCuRb",
         "_score" : 1.0,
         "_source" : {
           "name" : "王五"
         }
       },
       {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "8KyFonEBuoXto2dXzeQN",
         "_score" : 1.0,
         "_source" : {
           "name" : "马六"
         }
       }
     ]
   }
 }

UPDATE

使用update进行更新时，文档必须已经存在；且更新只会对相应字段做增量修改

针对不存在的文档进行更新操作

 POST my_index/_update/100
 {
   "doc":{
     "name":"张三"
   }
 }
 #错误信息如下
 {
   "error" : {
     "root_cause" : [
       {
         "type" : "document_missing_exception",
         "reason" : "[_doc][100]: document missing",
         "index_uuid" : "huioLo83QmCdtFickQ9S5A",
         "shard" : "0",
         "index" : "my_index"
       }
     ],
     "type" : "document_missing_exception",
     "reason" : "[_doc][100]: document missing",
     "index_uuid" : "huioLo83QmCdtFickQ9S5A",
     "shard" : "0",
     "index" : "my_index"
   },
   "status" : 404
 }

针对id为1的文档进行操作，id为1的文档中目标保存数据信息"xb":"男"，version值为2

 POST my_index/_update/1
 {
   "doc":{
     "sg":"123"
   }
 }
 
 #执行更新命令之后，重新获取id为1的文档发现，版本号+1，旧字段(更新语句中未包含)存在，新增字段成功
 GET my_index/_doc/1
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 3,
   "_seq_no" : 6,
   "_primary_term" : 1,
   "found" : true,
   "_source" : {
     "xb" : "男",
     "sg" : "123"
   }
 }

DELETE

用于删除指定id的文档；若id不存在则保存；文档删除之后版本信息不会被清空

删除id为1的文档

 DELETE my_index/_doc/1
 
 #删除成功提示如下
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 4,
   "result" : "deleted",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 7,
   "_primary_term" : 1
 }

再次删除id为1的文档

 DELETE my_index/_doc/1
 
 #删除id不存在的文档，报错信息如下
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 5,
   "result" : "not_found",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 8,
   "_primary_term" : 1
 }

测试新建id为的文档，查看版本号

 PUT my_index/_create/1
 {
   "name":"张三"
 }
 
 #文档删除之后若使用create版本号将加1；若使用index则版本号重新计数
 {
   "_index" : "my_index",
   "_type" : "_doc",
   "_id" : "1",
   "_version" : 6,
   "result" : "created",
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "failed" : 0
   },
   "_seq_no" : 16,
   "_primary_term" : 1
 }

批量操作

每一次执行CRUD都需要发送网络请求调用API，若是需要查询100条或者插入100条数据，那么需要发送100次网络请求，这个开销是很大的。所以这边引入了批量操作的概念，多次操作只需要发送一个网络请求。主边主要介绍bulk、mget、msearch三种

bulk

支持CUD(增改删)操作

mget

支持R(查)操作，主要用于精确（明确索引名称及文档id）查询

msearch

支持R(查)操作，主要用于模糊查询

bulk

bulk支持index、create、update、delete

 POST _bulk
 { "index" : { "_index" : "my_index", "_id" : "1" } }
 { "xm" : "xm_index" }
 { "delete" : { "_index" : "my_index", "_id" : "2" } }
 { "create" : { "_index" : "my_index","_id":4} }
 { "xm" : "xm_create" }
 { "update" : {"_id" : "1", "_index" : "my_index"} }
 { "doc" : {"sfzh" : "sfzh_update"} }
 #每一行末尾需要换行符(/n)，包括最后一行
 #每个操作由两行json组成：一行action（包括index、delete、create、update）+元数据（包括_index、_id），另一行数据;其中delete比较特殊，只需要action+元数据一行
 
 
 #针对bulk中的语句，给出单独的反馈
 #每个操作独立的，单条失败不会影响其他操作
 #返回结果如下:
 {
   "took" : 281,
   "errors" : false,
   "items" : [
     {
       "index" : {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "1",
         "_version" : 9,
         "result" : "updated",
         "_shards" : {
           "total" : 2,
           "successful" : 2,
           "failed" : 0
         },
         "_seq_no" : 17,
         "_primary_term" : 1,
         "status" : 200
       }
     },
     {
       "delete" : {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "2",
         "_version" : 2,
         "result" : "deleted",
         "_shards" : {
           "total" : 2,
           "successful" : 2,
           "failed" : 0
         },
         "_seq_no" : 18,
         "_primary_term" : 1,
         "status" : 200
       }
     },
     {
       "create" : {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "4",
         "_version" : 1,
         "result" : "created",
         "_shards" : {
           "total" : 2,
           "successful" : 2,
           "failed" : 0
         },
         "_seq_no" : 19,
         "_primary_term" : 1,
         "status" : 201
       }
     },
     {
       "update" : {
         "_index" : "my_index",
         "_type" : "_doc",
         "_id" : "1",
         "_version" : 10,
         "result" : "updated",
         "_shards" : {
           "total" : 2,
           "successful" : 2,
           "failed" : 0
         },
         "_seq_no" : 20,
         "_primary_term" : 1,
         "status" : 200
       }
     }
   ]
 }

mget

mget用于批量精确查询，可以减少网络连接所产生的开销，提高性能

 GET /_mget
 {
     "docs" : [
         {
             "_index" : "my_index",
             "_id" : "1"
         },
         {
             "_index" : "my_index",
             "_id" : "2"
         }
     ]
 }
 #需要指定_index和_id
 #所有的查询条件需要放在"docs"后面的[]里面
 
 
 #针对mget中的查询条件，给出单独的反馈
 #返回结果如下:
 {
   "docs" : [
     {
       "_index" : "my_index",
       "_type" : "_doc",
       "_id" : "1",
       "_version" : 10,
       "_seq_no" : 20,
       "_primary_term" : 1,
       "found" : true,
       "_source" : {
         "xm" : "xm_index",
         "sfzh" : "sfzh_update"
       }
     },
     {
       "_index" : "my_index",
       "_type" : "_doc",
       "_id" : "2",
       "found" : false
     }
   ]
 }

msearch

msearch用于批量查询,适用于全文搜索

 GET _msearch
 {"index" : "my_index"}
 {"query" : {"match_all" : {}}, "from" : 0, "size" : 1}
 {"index" : "movies"}
 {"query" : {"match_all" : {}}, "from" : 0, "size" : 1}
 #每一行末尾需要换行符(/n)，包括最后一行
 #每个操作由两行json组成：一行索引名称，一行查询参数
 
 
 #针对msearch中的语句，给出单独的反馈
 #返回结果如下:
 {
   "took" : 1,
   "responses" : [
     {
       "took" : 1,
       "timed_out" : false,
       "_shards" : {
         "total" : 1,
         "successful" : 1,
         "skipped" : 0,
         "failed" : 0
       },
       "hits" : {
         "total" : {
           "value" : 5,
           "relation" : "eq"
         },
         "max_score" : 1.0,
         "hits" : [
           {
             "_index" : "my_index",
             "_type" : "_doc",
             "_id" : "76yFonEBuoXto2dXCuRb",
             "_score" : 1.0,
             "_source" : {
               "name" : "王五"
             }
           }
         ]
       },
       "status" : 200
     },
     {
       "took" : 1,
       "timed_out" : false,
       "_shards" : {
         "total" : 1,
         "successful" : 1,
         "skipped" : 0,
         "failed" : 0
       },
       "hits" : {
         "total" : {
           "value" : 9743,
           "relation" : "eq"
         },
         "max_score" : 1.0,
         "hits" : [
           {
             "_index" : "movies",
             "_type" : "_doc",
             "_id" : "1660",
             "_score" : 1.0,
             "_source" : {
               "year" : 1997,
               "@version" : "1",
               "title" : "Eve's Bayou",
               "id" : "1660",
               "genre" : [
                 "Drama"
               ]
             }
           }
         ]
       },
       "status" : 200
     }
   ]
 }

常见错误代码

问题	原因
无法连接	网络故障或集群挂掉
连接无法关闭	网络故障或节点出错
429	集群过于繁忙
4XX	请求体格式错误
500	集群内部错误

数据库 elasticsearch

文章转载自lin在路上，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。