elasticsearch5.6.1集群安装

下载ES5.6.1:
   wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.1.tar.gz
解压到当前文件夹：
   tar -xzvf elasticsearch-5.6.1.tar.gz

修改sysctl文件：sudo vim /etc/sysctl.conf ，增加下面配置项：注意在每台机器上执行
增加改行配置：vm.max_map_count=655360
保存退出后，执行：
sudo sysctl -p

cd到/home/hadoop/elasticsearch-5.6.1/config目录，找到elasticsearch.yml文件
   vim elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 集群名
cluster.name: es-app
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 节点名
node.name: master

# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
# 内存/这个跟系统有关的，如果系统底会出现版本太底的错误
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 绑定地址
network.host: 192.168.93.140
#
# Set a custom port for HTTP:
# http端口，外部通这个来请求数据;tcp:端口; 当在一台主机上配置多个节点时，这个一定要配置的。
http.port: 9200
transport.tcp.port: 9300
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这个节点的IP:port;默认的是这个端口，在一台机器配置多节点一定要加上port
discovery.zen.ping.unicast.hosts: ["192.168.93.140:9300", "192.168.93.141:9300","192.168.93.142:9300"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
# 防止脑裂
discovery.zen.minimum_master_nodes: 1

配置好之后，向其它节点复制过去就可以了，然后在各个节点把node.name与IP修改就可以了。
scp -r /home/hadoop/elasticsearch-5.6.1 hadoop@slaver1:~

scp -r /home/hadoop/elasticsearch-5.6.1 hadoop@slaver2:~

启动 cd 到cd到/home/hadoop/elasticsearch-5.6.1/bin下
./elasticsearch
注意每个节点都要启动
hadoop@master:~/elasticsearch-5.6.1/bin$ ./elasticsearch
[2017-09-24T19:02:08,979][INFO ][o.e.n.Node               ] [master] initializing ...
[2017-09-24T19:02:09,245][INFO ][o.e.e.NodeEnvironment    ] [master] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [21.5gb], net total_space [41.2gb], spins? [possibly], types [ext4]
[2017-09-24T19:02:09,246][INFO ][o.e.e.NodeEnvironment    ] [master] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-09-24T19:02:09,248][INFO ][o.e.n.Node               ] [master] node name [master], node ID [h1_nDt8nSiCPysC_YvCiCQ]
[2017-09-24T19:02:09,249][INFO ][o.e.n.Node               ] [master] version[5.6.1], pid[81870], build[667b497/2017-09-14T19:22:05.189Z], OS[Linux/4.10.0-35-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-09-24T19:02:09,249][INFO ][o.e.n.Node               ] [master] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/home/hadoop/elasticsearch-5.6.1]
[2017-09-24T19:02:11,242][INFO ][o.e.p.PluginsService     ] [master] loaded module [aggs-matrix-stats]
[2017-09-24T19:02:11,242][INFO ][o.e.p.PluginsService     ] [master] loaded module [ingest-common]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService     ] [master] loaded module [lang-expression]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService     ] [master] loaded module [lang-groovy]
[2017-09-24T19:02:11,243][INFO ][o.e.p.PluginsService     ] [master] loaded module [lang-mustache]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService     ] [master] loaded module [lang-painless]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService     ] [master] loaded module [parent-join]
[2017-09-24T19:02:11,244][INFO ][o.e.p.PluginsService     ] [master] loaded module [percolator]
[2017-09-24T19:02:11,245][INFO ][o.e.p.PluginsService     ] [master] loaded module [reindex]
[2017-09-24T19:02:11,245][INFO ][o.e.p.PluginsService     ] [master] loaded module [transport-netty3]
[2017-09-24T19:02:11,246][INFO ][o.e.p.PluginsService     ] [master] loaded module [transport-netty4]
[2017-09-24T19:02:11,247][INFO ][o.e.p.PluginsService     ] [master] no plugins loaded
[2017-09-24T19:02:14,304][INFO ][o.e.d.DiscoveryModule    ] [master] using discovery type [zen]
[2017-09-24T19:02:15,462][INFO ][o.e.n.Node               ] [master] initialized
[2017-09-24T19:02:15,463][INFO ][o.e.n.Node               ] [master] starting ...
[2017-09-24T19:02:15,793][INFO ][o.e.t.TransportService   ] [master] publish_address {192.168.93.140:9300}, bound_addresses {192.168.93.140:9300}
[2017-09-24T19:02:15,815][INFO ][o.e.b.BootstrapChecks    ] [master] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-09-24T19:02:18,924][INFO ][o.e.c.s.ClusterService   ] [master] new_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-09-24T19:02:18,971][INFO ][o.e.h.n.Netty4HttpServerTransport] [master] publish_address {192.168.93.140:9200}, bound_addresses {192.168.93.140:9200}
[2017-09-24T19:02:18,972][INFO ][o.e.n.Node               ] [master] started
[2017-09-24T19:02:18,988][INFO ][o.e.g.GatewayService     ] [master] recovered [0] indices into cluster_state
[2017-09-24T19:03:19,541][INFO ][o.e.c.s.ClusterService   ] [master] added {{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300},}, reason: zen-disco-node-join[{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300}]
[2017-09-24T19:03:20,157][WARN ][o.e.d.z.ElectMasterService] [master] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])
[2017-09-24T19:05:25,236][INFO ][o.e.c.s.ClusterService   ] [master] added {{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300},}, reason: zen-disco-node-join[{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300}]

我们查看slaver1和slaver2上的日志：
slaver1:
[2017-09-24T19:03:20,144][INFO ][o.e.c.s.ClusterService   ] [slaver1] detected_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, added {{master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [3]])
[2017-09-24T19:03:20,187][INFO ][o.e.h.n.Netty4HttpServerTransport] [slaver1] publish_address {192.168.93.141:9200}, bound_addresses {192.168.93.141:9200}
[2017-09-24T19:03:20,188][INFO ][o.e.n.Node               ] [slaver1] started
[2017-09-24T19:05:25,309][INFO ][o.e.c.s.ClusterService   ] [slaver1] added {{slaver2}{_klsi3jPQP2hiPULnjsvyA}{Mu7pypT3R8CtuPh4ar9mLw}{192.168.93.142}{192.168.93.142:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [4]])

slaver2:

[2017-09-24T19:05:25,706][INFO ][o.e.c.s.ClusterService   ] [slaver2] detected_master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300}, added {{slaver1}{xyb705aPSPq9iH-z8WHBpg}{C6yz5DQtSje3hHqmEwiPSw}{192.168.93.141}{192.168.93.141:9300},{master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300},}, reason: zen-disco-receive(from master [master {master}{h1_nDt8nSiCPysC_YvCiCQ}{efcDdMrKSmSObgecX79mEw}{192.168.93.140}{192.168.93.140:9300} committed version [4]])
[2017-09-24T19:05:28,167][INFO ][o.e.h.n.Netty4HttpServerTransport] [slaver2] publish_address {192.168.93.142:9200}, bound_addresses {192.168.93.142:9200}
[2017-09-24T19:05:28,168][INFO ][o.e.n.Node               ] [slaver2] started

通过以上日志可以看到各个节点相互发现了。

集群健康值：
hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/health?pretty=true或者在浏览器中输入http://192.168.93.140:9200/_cluster/health?pretty=true
{
"cluster_name" : "es-app",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

集群状态：

hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/state或者在浏览器中输入http://192.168.93.140:9200/_cluster/state

{"cluster_name":"es-app","version":4,"state_uuid":"nNfQrzNOSs6yrU7VNHNlwg","master_node":"h1_nDt8nSiCPysC_YvCiCQ","blocks":{},"nodes":{"_klsi3jPQP2hiPULnjsvyA":{"name":"slaver2","ephemeral_id":"Mu7pypT3R8CtuPh4ar9mLw","transport_address":"192.168.93.142:9300","attributes":{}},"xyb705aPSPq9iH-z8WHBpg":{"name":"slaver1","ephemeral_id":"C6yz5DQtSje3hHqmEwiPSw","transport_address":"192.168.93.141:9300","attributes":{}},"h1_nDt8nSiCPysC_YvCiCQ":{"name":"master","ephemeral_id":"efcDdMrKSmSObgecX79mEw","transport_address":"192.168.93.140:9300","attributes":{}}},"metadata":{"cluster_uuid":"o1DzbTt6RY-bgh4ilZ47Yw","templates":{},"indices":{},"index-graveyard":{"tombstones":[]}},"routing_table":{"indices":{}},"routing_nodes":{"unassigned":[],"nodes":{"_klsi3jPQP2hiPULnjsvyA":[],"xyb705aPSPq9iH-z8WHBpg":[],"h1_nDt8nSiCPysC_YvCiCQ":[]}}}

集群统计：

hadoop@master:/opt/Hadoop/zookeeper-3.4.10/bin$ curl http://192.168.93.140:9200/_cluster/stats
{"_nodes":{"total":3,"successful":3,"failed":0},"cluster_name":"es-app","timestamp":1506308656817,"status":"green","indices":{"count":0,"shards":{},"docs":{"count":0,"deleted":0},"store":{"size_in_bytes":0,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":0,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-9223372036854775808,"file_sizes":{}}},"nodes":{"count":{"total":3,"data":3,"coordinating_only":0,"master":3,"ingest":3},"versions":["5.6.1"],"os":{"available_processors":48,"allocated_processors":48,"names":[{"name":"Linux","count":3}],"mem":{"total_in_bytes":25049698304,"free_in_bytes":1368895488,"used_in_bytes":23680802816,"free_percent":5,"used_percent":95}},"process":{"cpu":{"percent":0},"open_file_descriptors":{"min":446,"max":447,"avg":446}},"jvm":{"max_uptime_in_millis":3758298,"versions":[{"version":"1.8.0_131","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"25.131-b11","vm_vendor":"Oracle Corporation","count":3}],"mem":{"heap_used_in_bytes":1444532368,"heap_max_in_bytes":6227755008},"threads":229},"fs":{"total_in_bytes":132766040064,"free_in_bytes":77017890816,"available_in_bytes":70202990592,"spins":"true"},"plugins":[],"network_types":{"transport_types":{"netty4":3},"http_types":{"netty4":3}}}}

用python测试
   sudo pip install elasticsearch

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from elasticsearch import Elasticsearch
from datetime import datetime
# 创建连接
es = Elasticsearch(hosts='192.168.93.140')
for i in range(1,100000):
    es.index(index='els_student', doc_type='test-type', id=i, body={"name": "student" + str(i), "age": (i % 100), "timestamp": datetime.now()})

curl -XPOST '192.168.93.140:9200/els_student/_search?pretty' -d '
{
"query": { "match_all": {} }

}'

curl -XPOST '192.168.93.140:9200/els_student/_search?pretty' -d '
{
"query": { "match": { "name": "student41" } }

}'

curl -XPUT http://192.168.93.140:9200/index

curl -XPOST http://192.168.93.140:9200/index/fulltext/_mapping -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }

}'

增加ik分词器（注意找对应版本的，可以参考 https://github.com/medcl/elasticsearch-analysis-ik）

hadoop@master:~/elasticsearch-5.6.1$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.1/elasticsearch-analysis-ik-5.6.1.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.1/elasticsearch-analysis-ik-5.6.1.zip
[=================================================] 100%??
-> Installed analysis-ik
hadoop@master:~/elasticsearch-5.6.1$ cd bin/

安装完重启ES

curl -XPOST http://192.168.93.140:9200/index/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'

curl -XPOST http://192.168.93.140:9200/index/fulltext/2 -d'
{"content":"公安部：各地校车将享最高路权"}
'

curl -XPOST http://192.168.93.140:9200/index/fulltext/3 -d'
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}
'

curl -XPOST http://192.168.93.140:9200/index/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首"}
'

curl -XPOST http://192.168.93.140:9200/index/fulltext/_search -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}
'

结果：

{
    "took": 169,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.6099695,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 0.6099695,
                "_source": {
                    "content": "中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首"
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 0.27179778,
                "_source": {
                    "content": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
                },
                "highlight": {
                    "content": [
                        "中韩渔警冲突调查：韩警平均每天扣1艘<tag1>中国</tag1>渔船"
                    ]
                }
            }
        ]
    }
}

Dictionary Configuration

IKAnalyzer.cfg.xml can be located at {conf}/analysis-ik/config/IKAnalyzer.cfg.xml or {plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
   <comment>IK Analyzer 扩展配置</comment>
   
   <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
   
   <entry key="ext_stopwords">custom/ext_stopword.dic</entry>
    
   <entry key="remote_ext_dict">location</entry>
    
   <entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>

热更新 IK 分词使用方法

目前该插件支持热更新 IK 分词，通过上文在 IK 配置文件中提到的如下配置

    
   <entry key="remote_ext_dict">location</entry>
    
   <entry key="remote_ext_stopwords">location</entry>

其中 location 是指一个 url，比如 http://yoursite.com/getCustomDict，该请求只需满足以下两点即可完成分词热更新。

    该 http 请求需要返回两个头部(header)，一个是 Last-Modified，一个是 ETag，这两者都是字符串类型，只要有一个发生变化，该插件就会去抓取新的分词进而更新词库。

    该 http 请求返回的内容格式是一行一个分词，换行符用即可。

满足上面两点要求就可以实现热更新分词了，不需要重启 ES 实例。

可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里，放在 nginx 或其他简易 http server 下，当 .txt 文件修改时，http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇，并更新这个 .txt 文件。

have fun.
常见问题

1.自定义词典为什么没有生效？

请确保你的扩展词典的文本格式为 UTF8 编码

2.如何手动安装？

git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
git checkout tags/{version}
mvn clean
mvn compile
mvn package

拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik 重启elasticsearch

3.分词测试失败请在某个索引下调用analyze接口测试,而不是直接调用analyze接口如:http://localhost:9200/your_index/_analyze?text=中华人民共和国MN&tokenizer=my_ik

    ik_max_word 和 ik_smart 什么区别?

ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合；

ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

curl -XPOST '192.168.93.140:9200/parsetext-index/_delete_by_query?pretty' -d '{
    "query": {
        "match_all": {}
    }
}'