第三章 ES集群介绍与备份

一、集群的介绍

1.集群的介绍

1.集群部署好以后，不论在哪一台节点操作，都可以互相看到数据
2.head插件连接任意一台机器都能看到三台
3.数据会自动调度到多个分片
4.如果主节点的分片出现故障，副节点的分片会自动切换为主分片
5.如果主节点出现问题，那么数据节点会自动切换为主节点

2.查看集群状态

1.查看节点信息
GET _cat/nodes
[root@es01 ~]# curl -s -XGET "http://10.0.0.71:9200/_cat/nodes"

2.查看集群状态
GET _cat/health

3.查看集群中的主节点
GET _cat/master

4.查看索引
GET _cat/indices

5.查看分片信息
GET _cat/shards

6.查看指定索引的分片
GET _cat/shards/index

3.注意事项

1.配置集群时，配置集群的IP，不需要把所有机器IP都写到配置文件，只需要有一台机器IP是集群中的
discovery.zen.ping.unicast.hosts: ["172.16.1.71", "172.16.1.72"]

2.集群选举相关参数，一定要设置集群数量的半数以上
discovery.zen.minimum_master_nodes: 2

3.索引创建，默认是一个副本五个分片

4.数据分配时分片颜色变化
	1）紫色：数据正在迁移（节点恢复时）
	2）黄色：数据正在复制（节点出现问题时）
	3）绿色：正常
	
5.3个节点时
	1）0副本情况，一台机器都不能损坏
	2）1副本情况，可以一台机器一台机器的坏
	3）2副本时，可以同时坏两台

6.索引一旦建立就不能修改分片数量了，但是可以修改副本数

4.监控

1.监控的时候，不能只监控集群颜色状态
	1）集群的节点数
	2）集群的状态
	3）以上两个内容，有一个改变就告警
	
2.使用插件监控

二、集群信息修改

1.修改指定索引的副本数

PUT /index/_settings
{
  "number_of_replicas": 2
}

2.修改所有索引的副本数

PUT /_all/_settings
{
  "number_of_replicas": 2
}

3.设置分片数

1）创建索引时指定分片书

#注意：索引一旦创建，分片数不得修改，只能在创建时指定数量
PUT /fenpian
{
  "settings": {
    "number_of_replicas": 1
    , "number_of_shards": 3
  }
}

2）创建索引规则

1.每分配一个分片，都会有额外的成本。
2.每个分片本质上就是一个Lucene索引，因此会消耗相应的文件句柄，内存和CPU资源。
3.每个搜索请求会调度到索引的每个分片中。如果分片分散在不同的节点倒是问题不太。但当分片开始竞争相同的硬件资源时，性能便会逐步下降。
4.ES使用词频统计来计算相关性。当然这些统计也会分配到各个分片上。如果在大量分片上只维护了很少的数据，则将导致最终的文档相关性较差。

3）企业中如何设置

2个节点：默认分片和副本即可
3个节点：比较重要的数据，配置2个副本，不重要的数据默认1个副本
		如果你有3个节点, 则推荐你创建的分片数最多不超过9(3x3)个.

是否有必要在每个节点上只分配一个索引的分片. 另外, 如果给每个分片分配1个副本, 你所需的节点数将加倍. 如果需要为每个分片分配2个副本, 则需要3倍的节点数.

三、集群检测

1.检测脚本n

#编写python脚本
[root@elkstack01 ~]# vim es_cluster_status.py
#!/usr/bin/env python
#coding:utf-8
#Author:_DriverZeng_
#Date:2017.02.12

import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
import subprocess
body = ""
false = "false"
clusterip = "10.0.0.51"
obj = subprocess.Popen(("curl -sXGET http://"+clusterip+":9200/_cluster/health?pretty=true"),shell=True, stdout=subprocess.PIPE)
data =  obj.stdout.read()
data1 = eval(data)
status = data1.get("status")
if status == "green":
    print "33[1;32m 集群运行正常 33[0m"
elif status == "yellow":
    print "33[1;33m 副本分片丢失 33[0m"
else:
    print "33[1;31m 主分片丢失 33[0m"

#执行结果如下
[root@elkstack01 ~]# python es_cluster_status.py
集群运行正常

2.增强插件 x-pack

四、ES优化

1.设置ES的JVM最大最小内存限制

1.ElasticSearch设置的内存不要超过32G
	一旦你越过那个神奇的30-32G的边界，指针就会切回普通对象的指针，每个对象的指针都变长了，就会使用更多的CPU内存带宽，也就是说你实际上失去了更多的内存。

2.ES的服务器，一半的内存都给到ES使用
	内存对于Elasticsearch来说绝对是重要的，用于更多的内存数据提供更快的操作。而且还有一个内存消耗大户-Lucene，Lucene的性能取决于和OS的交互，如果你把所有的内存都分配给Elasticsearch，不留一点给Lucene，那你的全文检索性能会很差的，最后标准的建议是把50%的内存给elasticsearch，剩下的50%也不会没有用处的，Lucene会很快吞噬剩下的这部分内存。不要超过32G

3.如何设置
	刚开始给一个很小的值，查看内存消耗，内存消耗过快慢慢的提高值，监控读写速率达到最高时确定设置的内存

2.问题

1.数据过多怎么办，磁盘空间不足怎么办？
	1）跟开发沟通，先尝试删除没有用的数据
	2）如果删除后资源还是不足，在考虑添加资源

3.优化文件描述符

#配置文件描述符
[root@es01 ~]# vim /etc/security/limits.conf
*	-	nofile	65535

#普通用户
[root@es01 ~]# vim /etc/security/limits.d/20-nproc.conf 
*          soft    nproc     4096
root       soft    nproc     unlimited

4.语句优化

1.条件程序时，尽量使用term查询，减少range查询
2.建立索引的时候，尽量使用大集合的方式

五、数据的备份和恢复

1.安装npm工具和node依赖包

[root@es01 ~]# yum install -y npm    #版本过低，与命令不兼容，不推荐

#上传node依赖包
[root@es01 ~]# rz
node-v10.16.3-linux-x64.tar.xz

2.解压部署软件

[root@es01 ~]# tar xf  node-v10.16.3-linux-x64.tar.xz -C /opt/
[root@es01 ~]# mv node-v10.16.3-linux-x64 node
[root@es01 ~]# echo "export PATH=/opt/node/bin:$PATH" >> /etc/profile
[root@es01 ~]# source /etc/profile
[root@es01 ~]# npm -v
[root@es01 ~]# node -v

3.指定使用国内淘宝npm源

[root@es01 ~]# npm config set registry https://registry.npm.taobao.org

4.安装es-dump

[root@es01 ~]# npm install -g elasticdump

5.备份命令

1）命令参数

#常用参数
--input：数据的来源地址或者文件
--output：数据导入或到处的目标
--type：备份的数据的类型（settings, analyzer, data, mapping, alias, template）

2）备份ES数据到另一台ES

elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=http://staging.es.com:9200/my_index 
  --type=analyzer
  
elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=http://staging.es.com:9200/my_index 
  --type=mapping
  
elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=http://staging.es.com:9200/my_index 
  --type=data

3）备份ES数据到json文件

elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=/data/my_index_mapping.json 
  --type=mapping

elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=/data/my_index.json 
  --type=data

4）导出ES数据并打包

elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=$ 
  | gzip > /data/my_index.json.gz

5）指定条件进行备份

elasticdump 
  --input=http://production.es.com:9200/my_index 
  --output=query.json 
  --searchBody="{"query":{"term":{"username": "admin"}}}"

6.导入命令

1）导入json文件到ES

elasticdump 
  --input=./alias.json 
  --output=http://es.com:9200 
  --type=alias
  
#注意：导入数据时，如果已存在相同的数据，会覆盖原来的数据，如果不存在则无影响

elasticdump 
  --input=/data/test_analyzer.json 
  --output=http://10.0.0.91:9200/test 
  --type=analyzer
  
elasticdump 
  --input=/data/test_data.json 
  --output=http://10.0.0.91:9200/test 
  --type=data
  
elasticdump 
  --input=/data/test_template.json 
  --output=http://10.0.0.91:9200/test 
  --type=template
  
elasticdump 
  --input=/data/test_mapping.json 
  --output=http://10.0.0.91:9200/test 
  --type=mapping

2）备份脚本

#!/bin/bash
echo '要备份的机器是：'${1}
index_name='
test
student
linux7
'
for index in `echo $index_name`
do
	echo "start input index ${index}"
	elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
	elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_analyzer.json --type=analyzer &> /dev/null
	elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_data.json --type=data &> /dev/null
	elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
	elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_template.json --type=template &> /dev/null
done

3）导入数据脚本

#!/bin/bash
echo '要导入的机器是：'${1}
index_name='
test
student
linux7
'
for index in `echo $index_name`
do
    echo "start input index ${index}"
    elasticdump --input=/data/${index}_alias.json --output=http://${1}:9200/${index} --type=alias &> /dev/null
    elasticdump --input=/data/${index}_analyzer.json --output=http://${1}:9200/${index} --type=analyzer &> /dev/null
    elasticdump --input=/data/${index}_data.json --output=http://${1}:9200/${index} --type=data &> /dev/null
    elasticdump --input=/data/${index}_template.json --output=http://${1}:9200/${index} --type=template &> /dev/null
done

六、中文分词器

https://github.com/medcl/elasticsearch-analysis-ik/

1.插入测试数据

POST /index/text/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /index/text/2
{"content":"公安部：各地校车将享最高路权"}

POST /index/text/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /index/text/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

2.检测数据

POST /index/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}

3.配置中文分词器

1）安装插件（集群中所有机器都执行）

1.在线安装
[root@redis01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

2.本地安装
[root@es01 ~]# rz
elasticsearch-analysis-ik-6.6.0.zi
[root@es01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-6.6.0.zip

ps:安装完成一定要systemctl restart elasticsearch

2）创建一个索引

PUT /news

3）添加mapping

curl -XPOST http://localhost:9200/news/text/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }
}'

4）添加我们制定的中文词语

[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">/etc/elasticsearch/analysis-ik/my.dic</entry>
    
[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/my.dic 
中国

[root@redis01 ~]# chown -R elasticsearch.elasticsearch /etc/elasticsearch/analysis-ik/my.dic

3）重新插入数据

POST /news/text/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /news/text/2
{"content":"公安部：各地校车将享最高路权"}

POST /news/text/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /news/text/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

4）再次检测

POST /news/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}