ElasticSearch 常用字段类型及增删改查

ES常用的数据类型可分为3大类

核⼼数据类型
复杂数据类型
专⽤数据类型

核心数据类型

（1）字符串

text ⽤于全⽂索引，搜索时会自动使用分词器进⾏分词再匹配
keyword 不分词，搜索时需要匹配完整的值

（2）数值型

整型： byte，short，integer，long
浮点型： float, half_float, scaled_float，double

（3）日期类型

date

json没有date类型，插入|更新文档|字段时怎么表示date类型？

#mapping，将字段类型设置为date
"type" : "date" 


#插入|更新此字段的值时，有3种表示方式

#使用固定格式的字符串
"2020-04-18"、"2020/04/18 09:00:00"   

#值使用长整型的时间戳，1970-01-01 00:00:00，s
1610350870    

#值使用长整型的时间戳，ms
1641886870000

（4）范围型

integer_range， long_range， float_range，double_range，date_range

比如招聘要求年龄在[20, 40]上，mapping：

age_limit :{
　"type" : "integer_range"
}

插入|更新文档|字段时，值写成json对象的形式：

"age_limit" : {
　"gte" : 20,
　"lte" : 40
}

gt是大于，lt是小于，e是equals等于。

按此字段搜索时，值写常量：

"term" : {
　"age_limit" : 30
}

age_limit的区间包含了此值的文档都算是匹配。

（5）布尔

boolean #true、false

（6）⼆进制

binary 会把值当做经过 base64 编码的字符串，默认不存储，且不可搜索

复杂数据类型

（1）对象

object

#定义mapping
"user" : {
    "type":"object"
}


#插入|更新字段的值，值写成json对象的形式
"user" : {
    "name":"chy",
    "age":12
}


#搜索时，字段名使用点号连接
"match":{
     "user.name":"chy"
 }

一个对象中可以嵌套对象。

（2）数组

#ES没有专门的数组类型，定义mapping，写成元素的类型
"arr" : {
    "type":"integer"
}


#插入|更新字段的值。元素可以是各种类型，但元素的类型要相同
"arr" : [1,3,4]

专用数据类型

#定义mapping
"ip_address" : {
    "type":"ip"
}


#插入|更新字段的值，值写成字符串形式
"ip" : "192.168.1.1"


#搜索
"match":{
     "ip_address":"192.168.1.1"
 }


#ip在192.168.0.0 ~ 192.168.255.255上的文档都匹配
"match":{
     "ip_address":"192.168.0.0/16"
 }

ElasticSearch 索引查询

我们通常用用_cat API检测集群是否健康。确保9200端口号可用:

curl 'localhost:9200/_cat/health?v'

　　绿色表示一切正常, 黄色表示所有的数据可用但是部分副本还没有分配,红色表示部分数据因为某些原因不可用.

2.通过如下语句，我们可以获取集群的节点列表：

　　curl 'localhost:9200/_cat/nodes?v'

3.通过如下语句，列出所有索引：

　　curl 'localhost:9200/_cat/indices?v'

4.创建索引

　　现在我们创建一个名为“customer”的索引，然后再查看所有的索引：

　curl -XPUT 'localhost:9200/customer?pretty'
　curl 'localhost:9200/_cat/indices?v'

5.插入

　　现在我么插入一些数据到集群索引。我们必须给ES指定索引的类型。如下语句："external" type, ID：1:

　　主体为JSON格式的语句： { "name": "John Doe" }

curl -H "Content-Type: application/json" -XPUT 'localhost:9200/customer/external/1?pretty' -d '{"name":"join"}'

上述命令语句是：先新增id为1，name为John Doe的数据，然后将id为1的name修改为join。

6.获取GET

   curl -XGET 'localhost:9200/customer/external/1?pretty'

　　其中含义为：获取customer索引下类型为external，id为1的数据，pretty参数表示返回结果格式美观。

7.删除索引 DELETE

　　curl -XDELETE 'localhost:9200/customer?pretty'
　　curl 'localhost:9200/_cat/indices?v'

8.通过以上命令语句的学习，我们发现索引的增删改查有一个类似的格式，总结如下：

　　curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>
　　
　　<REST Verb>：REST风格的语法谓词

　　<Node>:节点ip

　　<port>:节点端口号，默认9200

　　<Index>:索引名

　　<Type>:索引类型

　　<ID>:操作对象的ID号

9.更新数据

　　这个例子展示如何将id为1文档的name字段更新为hello world：

curl -H "Content-Type:application/json" -XPOST 'localhost:9200/customer/external/1?pretty' -d '{"name":"john"}'


 curl -H "Content-Type:application/json" -XPOST 'localhost:9200/customer/external/1?pretty' -d '{"name":"hello world"}'

10、删除数据

　　删除数据那是相当的直接. 下面的语句将执行删除Customer中ID为1的数据：

　　curl -XDELETE 'localhost:9200/customer/external/1?pretty'

11.批处理

　　举例:

　　在一个批量操作中执行创建索引：

　　curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
　　{"index":{"_id":"1"}}
　　{"name": "John Doe" }
　　{"index":{"_id":"2"}}
　　{"name": "Jane Doe" }
　　'

　　批处理执行更新id为1的数据然后执行删除id为2的数据

　　curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
　　{"update":{"_id":"1"}}
　　{"doc": { "name": "John Doe becomes Jane Doe" } }
　　{"delete":{"_id":"2"}}

13.查询

curl 'localhost:9200/bank/_search?q=*&pretty'

上面示例返回所有bank中的索引数据。其中 q=* 表示匹配索引中所有的数据。

　　等价于:

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
　　  "query": { "match_all": {} }
　　}'

查询语言

　　匹配所有数据，但只返回1个:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

{

"query": { "match_all": {} },

"size": 1

}'

　　注意：如果size不指定，则默认返回10条数据。

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

{

"query": { "match_all": {} },

"from": 10,

"size": 10

}'

返回从11到20的数据。（索引下标从0开始）

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

{

"query": { "match_all": {} },

"sort": { "balance": { "order": "desc" } }

}'

　　上述示例匹配所有的索引中的数据，按照balance字段降序排序，并且返回前10条（如果不指定size，默认最多返回10条）。

返回account_number 为20 的数据:

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
 　　 "query": { "match": { "account_number": 20 } }
　　}'

　　返回address中包含mill的所有数据：:

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
 　　 "query": { "match": { "address": "mill" } }
　　}'

　　返回地址中包含mill或者lane的所有数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
  　　"query": { "match": { "address": "mill lane" } }
　　}'

和上面匹配单个词语不同，下面这个例子是多匹配（match_phrase短语匹配），返回地址中包含短语 “mill lane”的所有数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
 　　 "query": { "match_phrase": { "address": "mill lane" } }
　　}'

　　以下是布尔查询，布尔查询允许我们将多个简单的查询组合成一个更复杂的布尔逻辑查询。

　　这个例子将两个查询组合，返回地址中含有mill和lane的所有记录数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
　　  "query": {
 　　   "bool": {
    　　  "must": [
     　　   { "match": { "address": "mill" } },
     　　   { "match": { "address": "lane" } }
    　　  ]
    　　}
  　　}
　　}'

　　上述例子中，must表示所有查询必须都为真才被认为匹配。

　　相反, 这个例子组合两个查询，返回地址中含有mill或者lane的所有记录数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
  　　"query": {
  　　  "bool": {
    　　  "should": [
     　　   { "match": { "address": "mill" } },
      　　  { "match": { "address": "lane" } }
     　　 ]
   　　 }
  　　}
　　}'

　　上述例子中，bool表示查询列表中只要有任何一个为真则认为匹配。

　　下面例子组合两个查询，返回地址中既没有mill也没有lane的所有数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
 　　 "query": {
  　　  "bool": {
    　　  "must_not": [
      　　  { "match": { "address": "mill" } },
       　　 { "match": { "address": "lane" } }
      　　]
    　　}
  　　}
　　}'

　　上述例子中,must_not表示查询列表中没有为真的（也就是全为假）时则认为匹配。

　　我们可以组合must、should、must_not来实现更加复杂的多级逻辑查询。

　　下面这个例子返回年龄大于40岁、不居住在ID的所有数据：

　　curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
　　{
 　　 "query": {
   　　 "bool": {
    　　  "must": [
       　　 { "match": { "age": "40" } }
     　　 ],
     　　 "must_not": [
       　　 { "match": { "state": "ID" } }
      　　]
    　　}
 　　 }
　　}'

　　16.过滤filter(查询条件设置)

　　下面这个例子使用了布尔查询返回balance在20000到30000之间的所有数据。

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

　　{

　　　　  "query": {

  　　　　  "bool": {

    　　　　  "must": { "match_all": {} },

     　　　　 "filter": {

        　　　　"range": {

          　　"balance": {

          　　  "gte": 20000,

           　　 "lte": 30000

         　　 }

       　　 }

     　　 }

   　　 }

  　　}

　　}'

　　17 聚合 Aggregations

　　下面这个例子：将所有的数据按照state分组（group），然后按照分组记录数从大到小排序，返回前十条（默认）：

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

　　{

 　　"size": 0,

　　 "aggs": {

  　　 "group_by_state": {

   　　  "terms": {

    　　　   "field": "state"

   　　  }

  　　 }

 　　}

　　}'