elasticSearch 初学探索

ES安装与使用[macOS版]

使用brew安装

brew tap elastic/tap
brew install elastic/tap/elasticsearch-full

ES安装完毕后的本地路径

Data:    /usr/local/var/lib/elasticsearch/elasticsearch_fightman/
Logs:    /usr/local/var/log/elasticsearch/elasticsearch_fightman.log
Plugins: /usr/local/var/elasticsearch/plugins/
Config:  /usr/local/etc/elasticsearch/

使用docker安装

# 搜索elasticsearch镜像
docker search elasticsearch

# 拉取elasticsearch镜像
docker pull elasticsearch

# 查看所有镜像
docker images

# 运行elasticsearch容器，-d 后台运行容器并打印出容器ID，-p 将容器的9200端口映射到主机的9200端口
docker run -d --name es \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
elasticsearch:7.15.2     [镜像名称，或者镜像id]

curl localhost:9200

使用tar.gz包安装并且安装kibana

#1. 下载并解压
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.15.2-darwin-x86_64.tar.gz 
tar -xzf elasticsearch-7.15.2-darwin-x86_64.tar.gz
#2. 后台启动 -d表示后台启动 -E表示启动的配置
cd elasticsearch-7.15.2/ 
./bin/elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1 -p pid
#3. kibana安装
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.15.2-linux-x86_64.tar.gz
tar -xzf kibana-7.15.2-linux-x86_64.tar.gz
cd kibana-7.15.2-linux-x86_64/ 
#4. kibana启动
./bin/kibana
#5. 浏览器启动
http://localhost:5601/

安装中文分词插件

git地址：https://github.com/medcl/elasticsearch-analysis-ik/ [需FQ]
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-7.15.2.zip 这个版本号要根据es的版本来敲定
版本地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

ik_smart：最少切分 ik_max_word：最细粒度划分

添加自定义词到扩展字典中

elasticsearch目录/plugins/ik/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict">my.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

GET _analyze
{
  "analyzer": "ik_smart",
  "text": [
      "年轻人不讲武德",
      "你耗子尾汁！"
    ]
}

{
  "tokens" : [
    {
      "token" : "年轻人",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "不讲",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "武德",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "你",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "耗子尾汁",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

ES的配置文件

es有三个配置文件

elasticsearch.yml 配置ElasticSearch

jvm.options 配置ElasticSearch的JVM

log4j2.properties配置ElasticSearch的日志

这三个文件存放在config文件夹中，config文件夹的位置取决于你的安装方式

tar.gz or zip安装：config文件夹位置：$ES_HOME/config

Debian or RPM安装：默认在/etc/elasticsearch，然而这个配置文件的位置也可以通过改变ES_PATH_CONF这个环境变量来改变

导入配置

在给es发布到生产之前，以下配置是必须要配置的

ES的几个重要概念

Node 与 Cluster

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。
单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

Index

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。
所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

Document

Index里面的单条记录称为Document。许多条 Document 构成了一个 Index。
Document使用Json格式
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}
同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。

Type

7.x 版将会彻底移除 Type

Restful风格

method	url地址	描述
PUT（创建,修改）	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST（创建）	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST（修改）	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE（删除）	localhost:9200/索引名称/类型名称/文档id	删除文档
GET（查询）	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档ID
POST（查询）	localhost:9200/索引名称/类型名称/文档id/_search	查询所有数据

创建一个索引

PUT /test1/type1/1
{
  "name": "流柚",
  "age": 18
}

{
  "_index" : "test1",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

字段数据类型
- 字符串类型
  - text：支持分词，全文检索,支持模糊、精确查询,不支持聚合,排序操作;text类型的最大支持的字符长度无限制,适合大字段存储；
  - keyword：不进行分词，直接索引、支持模糊、支持精确匹配，支持聚合、排序操作。keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度，超过给定长度后的数据将不被索引，无法通过term精确匹配检索返回结果。
- 数值型
  - Long, integer, short, byte, double, float, half float, scaled float
- 日期类型
  - date
- 布尔类型
  - boolean
- 二进制类型
  - binary
- 向量类型
  - dense_vector

指定字段类型

类似于建库（建立索引和字段对应类型），也可看做规则的建立

PUT /test2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age":{
        "type": "long"
      },
      "birthday":{
        "type": "date"
      }
    }
  }
}

不指定字段类型

会产生一个默认类型进行代替

PUT /test3/_doc/1
{
  "name": "办会员",
  "age": 28,
  "birth": "1992-10-01"
}

GET /test3

{
  "test3" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birth" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "test3",
        "creation_date" : "1637564393505",
        "number_of_replicas" : "1",
        "uuid" : "vozzLXjFSZS8aV1vWhI0yg",
        "version" : {
          "created" : "7150299"
        }
      }
    }
  }
}

设置keyword字段

PUT blog 
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "age":{
        "type": "integer"
      },
      "desc": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

通过 GET _cat/*** 可以知道很多信息
1. GET _cat/indices
2. GET _cat/aliases
3. GET _cat/allocation
4. GET _cat/count
5. GET _cat/fielddata
6. GET _cat/health
7. GET _cat/indices
8. GET _cat/master
9. GET _cat/nodeattrs
10. GET _cat/nodes
11. GET _cat/pending_tasks
12. GET _cat/plugins
13. GET _cat/recovery
14. GET _cat/repositories
15. GET _cat/segments
16. GET _cat/shards
17. GET _cat/snapshots
18. GET _cat/tasks
19. GET _cat/templates
20. GET _cat/thread_pool

修改字段

不建议使用put命令进行修改，这样如果漏写字段的话，会删除原有字段。put就是覆盖

使用post的_update方法

POST /test3/_doc/1/_update
{
  "doc": {
    "name": "办10个会员",
    "age": 15
  }
}

Get /test3/_doc/1

{
  "_index" : "test3",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "办10个会员",
    "age" : 15,
    "birth" : "1992-10-01"
  }
}

删除索引

DELETE [indices]

查询

复杂查询

查询匹配

match ：匹配，使用分词器解析（先分析文档，然后进行查询）
_source：过滤字段
sort：排序
form，size：分页

// 查询匹配

GET blog/_search 
{
  "query": {
    "match": {  //模糊匹配
      "desc": "向上"
    }
  },
  "_source": ["name", "desc"],  // 查询返回的字段
  "sort": [  // 排序
    {
      "age": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 20
}

多条件查询（boll）

must相当于and
should相当于or
must_not相当于not (...and...)
filter过滤

GET blog/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "刘"
          }
        },
        {
          "match": {
            "desc": "向上"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 60
            }
          }
        }
      ]
    }
  }
}

匹配数组

貌似不能与其他字段一起用
可以多关键字查询
match会使用分词器先分词，然后查

GET blog/_search
{
  "query": {
    "match": {
      "desc": "向上，仗义，乐观"
    }
  }
}

精确查询

term直接通过倒排索引，指定词条查询 // 精确查询（必须全部都有，而且不可分，即按一个完整的词查询）
貌似单个字才可以

GET blog/_search
{
  "query": {
    "term": {
      "desc": {
        "value": "阳"
      }
    }
  }
}

使用term查询发现的问题
问题：中文字符串，term查询时无法查询到数据（比如，“编程”两字在文档中存在，但是搜索不到）

原因：索引为配置中文分词器（默认使用standard，即所有中文字符串都会被切分为单个中文汉字作为单词），所以没有超过1个汉字的词，也就无法匹配，进而查不到数据
解决：创建索引时配置中文分词器
PUT example
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "ik_max_word"  // ik分词器
      }
    }
  }
}
查询的英文只能是小写，大写无效

查询时英文单词必须是完整的

text和keyword

text
- 支持分词、全文检索、支持模糊查询、精确查询、不支持聚合、排序操作
- text类型的最大支持的字符串长度无限制，适合大字段存储
keyword
- 不进行分词、直接索引、支持模糊、支持精确匹配，支持聚合、排序操作
- keyword类型的最大支持的长度为——32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度，超过给定长度后的数据将不被索引，无法通过term精确匹配检索返回结果。

// 测试keyword和text是否支持分词
// 设置索引类型
PUT /test
{
  "mappings": {
    "properties": {
      "text":{
        "type":"text"
      },
      "keyword":{
        "type":"keyword"
      }
    }
  }
}
// 设置字段数据
PUT /test/_doc/1
{
  "text":"测试keyword和text是否支持分词",
  "keyword":"测试keyword和text是否支持分词"
}
// text 支持分词
// keyword 不支持分词
GET /test/_doc/_search
{
  "query":{
   "match":{
      "text":"测试"
   }
  }
}// 查的到
GET /test/_doc/_search
{
  "query":{
   "match":{
      "keyword":"测试"
   }
  }
}// 查不到，必须是 "测试keyword和text是否支持分词" 才能查到

GET _analyze
{
  "analyzer": "keyword",
  "text": ["测试liu"]
}// 不会分词，即 测试liu

/*
{
  "tokens" : [
    {
      "token" : "测试liu",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    }
  ]
}
*/


GET _analyze
{
  "analyzer": "standard",
  "text": ["测试liu"]
}// 分为 测 试 liu
/*
{
  "tokens" : [
    {
      "token" : "测",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "试",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "liu",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}
*/

GET _analyze
{
  "analyzer":"ik_max_word",
  "text": ["测试liu"]
}// 分为 测试 liu
/*
{
  "tokens" : [
    {
      "token" : "测试",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "liu",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "ENGLISH",
      "position" : 1
    }
  ]
}
*/

高亮查询

GET blog/_search
{
  "query": {
    "match": {
      "name": "刘"
    }
  },
  "highlight": {
    "pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>", 
    "fields": {
      "name": {}
    }
  }
}

{
  "took" : 41,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.6682933,
    "hits" : [
      {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6682933,
        "_source" : {
          "name" : "刘敏开",
          "age" : 28,
          "desc" : "阳光乐观，积极向上"
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'>刘</p>敏开"
          ]
        }
      },
      {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.6682933,
        "_source" : {
          "name" : "刘海柱",
          "age" : 54,
          "desc" : "行侠仗义，单挑王，向上"
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'>刘</p>海柱"
          ]
        }
      }
    ]
  }
}