ElasticSearch 简单入门

一、前言

ElasticSearch 是一个分布式、可扩展、实时的搜索与数据分析引擎。它建立在 Apache Lucene 基础之上。Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库(无论是开源还是私有)。ElasticSearch 将所有的功能打包成一个单独的服务,这样你可以通过程序与它提供的简单的 RESTful API 进行通信,可以使用自己喜欢的编程语言充当客户端。

二、使用场景
 
eBay 内部上百个 ElasticSearch 集群,超过 4000 个数据节点的规模,这些集群在 eBay 的生产环境中,支撑了包括订单搜索,商品推荐,集中化日志管理,风险控制,IT 运维,安全监控等不同领域的服务。
 
  场景举例:
  • 当你在 Github 上搜索时,ElasticSearch 不仅可以帮你找到相关的代码库,还可以帮助你实现代码级的搜索与高亮显示
  • 当你在网上购物时,ElasticSearch 可以帮你推荐相关的商品
  • 当你下班打车回家时,ElasticSearch 可以通过定位附近的乘客和司机,帮助平台优化调度
  • Wikipedia 使用 ElasticSearch 提供高亮片段的全文搜索。
除了搜索,结合 Kibana、Logstash、Beats、Elastic Stack 还被广泛运用在大数据近实时分析领域,包括日志分析、指标监控、信息安全多个领域。它可以帮助你探索海量结构化、非结构化数据,按需创建可视化报表,对监控数据设置报警阈值。甚至通过使用机器学习技术,自动识别异常状况。

三、单实例安装

介质准备:

elasticsearch-7.10.2-linux-x86_64.tar.gz
elasticsearch-analysis-ik-7.10.2.zip
elasticsearch-analysis-pinyin-7.10.2.zip
kibana-7.10.2-linux-x86_64.tar.gz

主机参数设置(/etc/sysctl.conf):

# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_keepalive_time = 15
net.ipv4.ip_local_port_range = 21000 61000
fs.file-max = 6553600
kernel.sem = 250 32000 100 128
net.ipv4.conf.all.accept_redirects = 0
net.core.somaxconn = 32768
vm.max_map_count = 524288

生效:sysctl -p

主机参数设置(/etc/security/limits.conf):

*  soft  nofile   1048576
*  hard  nofile   1048576
*  soft  nproc    65536
*  hard  nproc    65536
*  soft  memlock  unlimited
*  hard  memlock  unlimited

目录规划:

.
|-- bin
|   |-- schema
|   |-- start-es.sh
|   |-- start-kibana.sh
|   |-- stop-es.sh
|   `-- sync
|-- data -> /data/es-data
|-- etc
|-- lib
|   |-- ojdbc8-19.8.0.0.jar
|   `-- orai18n-19.8.0.0.jar
|-- logs
|-- sbin
|-- support
    |-- elasticsearch-7.10.2
    |-- es -> elasticsearch-7.10.2
    |-- kibana -> kibana-7.10.2-linux-x86_64
    |-- kibana-7.10.2-linux-x86_64
    |-- logstash -> logstash-7.10.2
    `-- logstash-7.10.2

.bash_profiler 设置

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

# +-------------------------------------+
# |      AI'S PROFILE, DON'T MODIFY!    |
# +-------------------------------------+
alias grep='grep --colour=auto'
alias vi='vim'
alias ll='ls -l'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
alias ups='ps -u `whoami` -f'

export ES_HOME=${HOME}/support/es
export JAVA_HOME=${ES_HOME}/jdk
export PS1="[33[01;32m]u@h[33[01;34m] w $[33[00m] "
export TERM=linux
export EDITOR=vim
export PATH=${HOME}/bin:${HOME}/sbin:${JAVA_HOME}/bin:${ES_HOME}/bin:${HOME}/support/logstash/bin:$PATH
export LANG=zh_CN.utf8
export TIMOUT=3000
export HISTSIZE=1000

根据环境调整 JVM 内存:~/support/es/config/jvm.options

-Xms16g
-Xmx16g

根据环境设置基础配置:~/support/es/config/elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: crm
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/es/data
#
# Path to log files:
#
path.logs: /home/es/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.230.55.48
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["10.230.55.48"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["10.230.55.48"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 1
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

# 安全认证配置:
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

启动脚本(~/bin/start-es.sh):

#!/bin/sh

cd ~/support/es/bin
./elasticsearch -d

设置密码:

~/support/es/bin/elasticsearch-setup-passwords interactive

需要设置 elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user 这些用户的密码,设置完就可以了。

验证:

es@centos01 ~/bin $ curl --user elastic:123456 -XGET http://10.230.55.48:9200?pretty=true
Enter host password for user 'elastic':
{
  "name" : "node-1",
  "cluster_name" : "crm",
  "cluster_uuid" : "1SAd8U-zRyGKy8ztRWAQhQ",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2021-01-13T00:42:12.435326Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Kibana 安装:

目录:~/support/kibana

修改配置:~/support/kibana/config/kibana.yml

server.port: 5601
server.host: "10.230.55.48"
elasticsearch.hosts: ["http://10.230.55.48:9200"]
elasticsearch.username: "elastic"
elasticsearch.password: "123456"
i18n.locale: "en"

Dev Tools:

# 查看 Elastic 版本信息
GET /
# 查看集群健康情况
GET _cluster/health

# 查看集群节点
GET _cat/nodes

# 分片情况
GET _cat/shards

# 查看索引清单
GET _cat/indices

# 查看索引数据量
GET sec_function/_count

四、索引

查看当前节点的所有 Index:

es@centos01 ~ $ curl --user elastic:123456 -XGET http://10.230.55.48:9200/_cat/indices?v
Enter host password for user 'elastic': health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb

新建和删除索引:

es@centos01 ~ curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/weather'
Enter host password for user 'elastic':
{"acknowledged":true,"shards_acknowledged":true,"index":"weather"} 

es@centos01 ~ curl -uelastic -XGET http://10.230.55.48:9200/_cat/indices?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb
green open weather vIVMeX22SReCpKGD0Pk5uw 5 1 0 0 2.2kb 1.1kb
es@centos01 ~ curl -uelastic -XDELETE 'http://10.230.55.48:9200/weather'
{"acknowledged":true}

五、中文分词

将 elasticsearch-analysis-ik-7.10.2.zip、elasticsearch-analysis-pinyin-7.10.2.zip 解压到 ~/support/es/plugins 目录下,并重启 ES。

es@centos01 ~/support $ tree ~/support/es/plugins/
/home/es/support/es/plugins/
|-- ik
|   |-- commons-codec-1.9.jar
|   |-- commons-logging-1.2.jar
|   |-- config
|   |   |-- extra_main.dic
|   |   |-- extra_single_word.dic
|   |   |-- extra_single_word_full.dic
|   |   |-- extra_single_word_low_freq.dic
|   |   |-- extra_stopword.dic
|   |   |-- IKAnalyzer.cfg.xml
|   |   |-- main.dic
|   |   |-- preposition.dic
|   |   |-- quantifier.dic
|   |   |-- stopword.dic
|   |   |-- suffix.dic
|   |   `-- surname.dic
|   |-- elasticsearch-analysis-ik-7.10.2.jar
|   |-- httpclient-4.5.2.jar
|   |-- httpcore-4.4.4.jar
|   |-- plugin-descriptor.properties
|   `-- plugin-security.policy
`-- pinyin
    |-- elasticsearch-analysis-pinyin-7.10.2.jar
    |-- nlp-lang-1.7.jar
    `-- plugin-descriptor.properties

3 directories, 22 files

六、数据操作

新索引准备:

curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/student' -H 'Content-Type: application/json' -d '
{
  "mappings" : {
    "properties" : {
      "name" : {
        "type" : "keyword"
      },
      "age" : {
        "type" : "integer"
      }
    }
  },
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    }
  }
}'
新增记录(使用 POST):
 
添加数据示例一:(POST 用于更新数据,如果不存在,则会创建。)
# 请求,没有指定 _id 的情况下,Elastic 将为你自动生成一个随机字符串作为 _id。
curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc?pretty=true' -H 'Content-Type: application/json' -d '
{
  "name": "张三"
}'

# 响应
{
  "_index" : "student",
  "_type" : "_doc",
  "_id" : "q6ek7XcBqu3Z6vLyxDD4",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

添加数据实例二:(指定 _id 为 2)

# 请求,指定 _id 为 2
curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d '
{
  "name": "李四"
}'

# 响应
{
  "_index" : "student",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

一种错误的数据更新方式:

# 请求
curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'

# 响应 {
"_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "name" : "李四" } }

我们注意到结果中没有 age 字段。

# 请求
curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d ' { "age": 10 }'
# 响应
{ "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "updated", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }

# 请求
curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'

# 响应
{ "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "age" : 10 } }

结果是 version 从1变成了2,而 name 字段不见了。原因是 POST student/_doc/2 这种语法的效果是覆盖数据。可以理解为先把原文档删除,再索引新文档。

使用 _update 更新文档

es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d '
{
  "name": "李四"
}'

es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2/_update?pretty=true' -H 'Content-Type: application/json' -d '
{
  "doc": {
    "age": 10
  }
}'


# 请求
es@centos01 ~ $ curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'                                                  
{
  "_index" : "student",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 4,
  "_seq_no" : 4,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "李四",
    "age" : 10
  }
}

使用 _update 时,ES 做了下面几件事:

  • 从旧文档构建 JSON
  • 更改该 JSON
  • 删除旧文档
  • 索引一个新文档
原文地址:https://www.cnblogs.com/steven-note/p/14463634.html