elasticsearch 学习

es是什么?

es是基于Apache Lucene的开源分布式（全文）搜索引擎，，提供简单的RESTful API来隐藏Lucene的复杂性。

1.分布式的实时存储,每个字段都被索引可被搜索

2.分布式实时分析搜索引擎

3.可以扩展到成千台服务器,处理pb级结构化或非结构化数据

es下载和安装

java for windows

es对于java jdk的版本有需求，必须是java1.8及以上版本。

安装步骤参考：https://www.cnblogs.com/Neeo/articles/10368280.html

es for windows

es开箱即用，也就是解压即可使用，安装参考https://www.cnblogs.com/Neeo/articles/10371306.html

kibana for windows

Kibana是一个为ElasticSearch 提供的数据分析的 Web 接口。可使用它对日志进行高效的搜索、可视化、分析等各种操作。

安装参考：https://www.cnblogs.com/Neeo/articles/10371213.html

es的快速上手

关系型数剧库 es

数据库索引indices

表 type 类型

记录 documents 文档

字段字段fields

elasticsearch（集群）中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档中又包含多个字段（列）。

文档

文档的特性：自我包含，层次型、结构灵活、无模式
类型：在es6.x版本开始，一个索引下面只能有一个类型，类型是是文档的容器，并且，类型记录了字段和值的映射关系。
索引，索引是映射类型的容器，elasticsearch中的索引是一个非常大的文档集合。索引存储了映射类型的字段和其他设置。然后它们被存储到了各个分片上了。
物理：节点，分片

节点，一个集群至少有一个节点，节点内可以有多个索引。在创建索引时，默认创建5个主分片，每个主分片搭配一个复制分片。
分片：文档存储在各个分片上，一个分片也是一个Lucene索引。
倒排索引，倒排索引是一个包含不重复词条的文档，我们称该文档为倒排文档。详情参考
es的索引和Lucene的索引对比
es的索引是由多个分片组成，而每个分片则是一个Lucene索引。
一个Lucene索引能存储不超过21亿篇文档，或者不超过2740亿个唯一词条。

基本操作

PUT s18/doc/1
{
  "name":"zhangsan",
  "age":18,
  "tags":"浪",
  "b":"19970505"
}
#结果中的result则是操作类型，现在是created，表示第一次创建。如果我们再次点击执行该命令，那么result则会是updated。我们细心则会发现_version开始是1，现在你每点击一次就会增加一次。表示第几次更改。

PUT s18/doc/2
{
  "name":"lisi",
  "age":15,
  "tangs":"骚",
  "b":"11232102"
}

PUT s18/doc/3
{
  "name":"wangwu",
  "age":18,
  "tangs":"帅",
  "b":"20150420"
}
#上例中，我们添加3篇文档，首先检查索引s18是否存在，不存在先创建，存在则添加（或更新）文档。

#查看指定文档
GET s18/doc/1
#查询所有
GET s18/doc/_search
#按条件查询
GET s18/doc/_search?q=name:zhangsan
#查看索引详情
GET s18

DELETE s18/doc/1 #删除指定文档
DELETE s18 #删除索引

#这样修改 其他字段就没有了
PUT s18/doc/1
{
  "tags":"帅气"
}


#修改指定字段
POST s18/doc/1/_update
{
  "doc": {
    "tags":"很浪"
  }
}

GET s18/doc/1

dsl 基本操作

#查出年纪等于18的
GET s18/doc/_search
{
  "query": {
    "match": {
      "age": 18
    }
  }
}

#查询多个条件
GET s18/doc/_search
{
  "query":{
    "match": {
      "tangs": "浪 骚"
    }
  }
}

#2中查询所有
GET s18/doc/_search

GET s18/doc/_search
{
  "query": {
    "match_all": {}
  }
}

排序

降序：desc
升序：asc

#排序 sort 倒序asc
GET s18/doc/_search
{
  "query":{
    "match_all": {}
  },
  "sort":[
    {
      "age":{
        "order":"desc"
      }
    }
    ]
}
###正序
GET s18/doc/_search
{
  "query":{
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}
######注意 不是所有的字段都能排序,最好只能是数字

分页

from：从哪开始查
size：返回几条结果

GET s18/doc/_search
{
  "query":{
    "match_all": {}
  },
  "from": 0,
  "size":2
}

GET s18/doc/_search
{
  "query": {
    "match_all": {}
  },
  "from": 2,
  "size": 20
}

bool查询

must：与关系，相当于关系型数据库中的and。
should：或关系，相当于关系型数据库中的or。
must_not：非关系，相当于关系型数据库中的not。
filter：过滤条件。
range：条件筛选范围。
gt：大于，相当于关系型数据库中的>。
gte：大于等于，相当于关系型数据库中的>=。
lt：小于，相当于关系型数据库中的<。
lte：小于等于，相当于关系型数据库中的<=。

#布尔查询bool should(or) must(and) must_not(not)
#查找帅的 或年纪等于17  should
GET s18/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "tangs": "帅"
          }
        },
        {
          "match": {
            "age": "17"
          }
        }
      ]
    }
  }
}

#查询 年纪18的 帅的  must
GET s18/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "age": "18"
          }
        },
        {
          "match": {
            "tangs": "帅"
          }
        }
      ]
    }
  }
}
#查询 不是18的也不是17的 must_not
GET s18/doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
          "age": "18"
        }
        },
        {
          "match": {
            "age": "17"
          }
        }
      ]
    }
  }
}


#lt小于 lte小于等于 gt大于 get大于等于
#查询小于20的女的
GET s18/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "sex": "女"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lt": 20
          }
        }
      }
    }
  }
}
#查询年纪小于等于18的非女性
GET s18/doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "sex": "女"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lte": 18
          }
        }
      }
    }
  }
}

高亮查询

#highlight 属性来实现结果高亮显示
#pre_tags 用来实现我们的自定义标签的前半部分
#post_tags 实现标签的后半部分
需要的字段名称添加到fields内即可

PUT zhifou/doc/4
{
  "name":"石头",
  "age":29,
  "from":"gu",
  "desc":"粗中有细，狐假虎威",
  "tags":["粗", "大","猛"]
}

GET zhifou/doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "from": {}
    }
  }
}

PUT s18/doc/6
{
  "name":"wangdi",
  "desc": "骚的打漂"
}

GET s18/doc/_search
{
  "query": {
    "match": {
      "desc": "打漂"
    }
  },
  "highlight": {
    "pre_tags": "<b style='color:red;font-size:20px;' class='wangdi'>", 
    "post_tags": "</b>", 
    "fields": {
      "desc": {}
    }
  }
}

结果过滤

GET s18/doc/_search
{
  "query": {
    "match": {
      "name": "zhangsan"
    }
  }
  , "_source": ["name","age"]
}

聚合查询 avg、max、min、sum

#聚合查询
GET s18/doc/_search
{
  "query": {
    "match": {
      "sex": "男"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  }
}
#查询年纪最大的男生max
GET s18/doc/_search
{
  "query": {
    "match": {
      "sex": "男"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  }
}

#查询年纪最小的min
GET s18/doc/_search
{
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  }
}

#求平局 avg
GET s18/doc/_search
{
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  }
}

field是以age为分组

分组

#分组 根据年龄 10-20 20-30 30-100
GET s18/doc/_search
{
  "query": {
    "match": {
      "sex": "男"
    }
  },
  "aggs": {
    "my_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 1,
            "to": 15
          },
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 30,
            "to":100
          }
        ]
      }
    }
  }
}
#分组 根据年龄 10-20 20-30 30-100 对每组年龄求和
GET s18/doc/_search
{
  "query": {
    "match": {
      "sex": "男"
    }
  },
  "aggs": {
    "group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 10,
            "to": 15
          },
          {
            "from": 16,
            "to":20
          }
        ]
      },
      "aggs": {
        "my_sum": {
          "sum": {
            "field": "age"
          }
        }
      }
    }
  }
}

mappings 映射相当于原来由elasticsearch自动帮我们定义表结构

PUT s2/doc/1
{
  "name":"zhangsan",
  "age":21,
  "desc":"低调"
}

PUT s2/doc/2
{
  "name":"lisi",
  "age":20,
  "desc":"骚气"
}

GET s2/doc/_search
{
  "query": {
    "match": {
      "desc": "骚气"
    }
  }
}

mapping的dynamic的三种状态

动态映射（dynamic：true）
静态映射（dynamic：false）
严格模式（dynamic：strict）

PUT m1
{
  "mappings": {
    "doc":{
      "properties":{
        "name":{
          "type":"text"
        },
        "age":{
          "type":"long"
        }
      }
    }
  }
}

GET m1/_mapping

GET m1/doc/_search
{
  "query": {
    "match": {
      "sex": "不详"
    }
  }
}

静态模式

PUT s5
{
  "mappings": {
    "doc":{
      "dynamic": false,
      "properties":{
        "name":{
          "type":"text"
        }
      }
    }
  }
}

PUT s5/doc/1
{
  "name":"玉冰"
}

PUT s5/doc/2
{
  "name":"peiqin",
  "age": 17
}

GET s5/_mapping
GET s5/doc/_search

#可以看到elasticsearch并没有为新增的sex建立映射关系。所以查询不到。
当elasticsearch察觉到有新增字段时，因为dynamic:false的关系，会忽略该字段，但是仍会存储该字段。
在有些情况下，dynamic:false依然不够，所以还需要更严谨的策略来进一步做限制。

严格模式

PUT m4
{
  "mappings": {
    "doc": {
      "dynamic": "strict", 
      "properties": {
        "name": {
          "type": "text"
        },
        "age": {
          "type": "long"
        }
      }
    }
  }
}

PUT m4/doc/1
{
  "name": "小黑",
  "age": 18
}
PUT m4/doc/2
{
  "name": "小白",
  "age": 18,
  "sex": "不详"
}

mapping的ignore_above

PUT s7
{
  "mappings": {
    "doc":{
      "properties":{
        "title":{
          "type":"keyword",
          "ignore_above": 10
        }
      }
    }
  }
}

PUT s7/doc/1
{
  "title": "从手机、平板电脑、路由器和视频游戏控制台"
}

PUT s7/doc/2
{
  "title": "1234567"
}


GET s7/doc/_search
{
  "query": {
    "match": {
      "title": "1234567"
    }
  }
}

index

`index`属性默认为`true`，如果该属性设置为`false`，那么，`elasticsearch`不会为该属性创建索引，也就是说无法当做主查询条件。

PUT s8
{
  "mappings": {
    "doc":{
      "properties":{
        "t1":{
          "type":"text",
          "index": true
        },
        "t2":{
          "type":"text",
          "index": false
        }
      }
    }
  }
}


PUT s8/doc/1
{
  "t1":"论母猪的产前保养",
  "t2":"论母猪的产后护理"
}

GET s8/doc/_search
{
  "query": {
    "match": {
      "t1": "母猪"
    }
  }
}

GET s8/doc/_search
{
  "query": {
    "match": {
      "t2": "母猪"
    }
  }
}

copy_to 属性允许我们将多个字段的值复制到组字段中，然后将组字段作为单个字段进行查询。

#把 first_name last_name copy到full_name里

PUT m5
{
  "mappings": {
    "doc": {
      "dynamic":false,
      "properties": {
        "first_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT m5/doc/1
{
  "first_name":"tom",
  "last_name":"ben"
}

PUT m5/doc/2
{
  "first_name":"john",
  "last_name":"smith"
}

GET m5/doc/_search
{
  "query": {
    "match": {
      "first_name": "tom"
    }
  }
}

GET m5/doc/_search
{
  "query": {
    "match": {
      "full_name": "tom"
    }
  }
}


GET m5/doc/_search  #可以查询到2条数据
{
  "query": {
    "match": {
      "full_name": "tom smith"
    }
  }
}

settings设置

number_of_shards是主分片数量（每个索引默认5个主分片），而number_of_replicas是复制分片，默认一个主分片搭配一个复制分片。

PUT w2
{
  "mappings": {
    "doc":{
      "properties":{
        "title":{
          "type":"text"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 3
  }
}

python 基本操作 elasticsearch

from elasticsearch import Elasticsearch
es = Elasticsearch()
print(es.index(index='p1', doc_type='doc', id=1, body={"name":"lou"}))
print(es.get(index='p1', doc_type='doc', id=1))
print(es.delete(index='p1', doc_type='doc', id=2))
#修改
print(es.index(index='p1', doc_type='doc', id=1, body={"name":"lou2"}))

def filter_msg(search_msg, target, current_page):
    if target == 'all':
        # print(target)
        body = {
            "size": 200,
            "query": {
                "match": {
                    "title": search_msg,
                }
            },
            "highlight": {
                "pre_tags": "<b style='color:red;'>",
                "post_tags": "</b>",
                "fields": {
                    "title": {}
                }
            }
        }
    else:
        body = {
            "size": 200,
            "query": {
                "bool": {
                    "must": [{"match": {
                        "title": search_msg,
                    }
                    }, {
                        "match": {
                            "tags": target,
                        }}
                    ]
                }

            },
            "highlight": {
                "pre_tags": "<b style='color:red;'>",
                "post_tags": "</b>",
                "fields": {
                    "title": {}
                }
            }
        }
    res = es.search(index='e1', body=body, filter_path=['hits.total', 'hits.hits'])
    page_obj = Pagination(current_page, res['hits']['total'])
    print(1111, res)
    res['page'] = page_obj.show_li
    res['data_msg'] = res['hits']['hits'][page_obj.start:page_obj.end]
    res['hits']['hits'] = ''
    return res

views 文件

def search(request):
    if request.method == 'GET':
        search_msg = request.GET.get("search_msg")
        target = request.GET.get("target")
        current_page = request.GET.get('current_page')
        print(target, '1111', current_page)
        res = filter_msg(search_msg, target, current_page)
        return JsonResponse(res)
    return render(request, 'index.html')