ElasticSearch5集群部署指南

本文简要介绍ES5版本集群部署时的要点。更多相关信息请参阅官网。

部分配置未在生产环境体现。生产中2个集群20台centOS，总数据15TB，90亿条。实时写入5000条/s，最大7万/s。

环境准备：

1. vi /etc/sysctl.conf 加入：
vm.max_map_count=262144

执行 sysctl -p 使配置生效

2. vi /etc/security/limits.d/20-nproc.conf #打开进程数
* soft nproc 65536

3. vi /etc/security/limits.conf #打开文件数
  * soft nofile 65536
  * hard nofile 65536
  * - memlock unlimited

ES配置：

设置至少1台纯Master， 2台混合Data/Master。合计3台master的集群。所有的bulk请求仅post到master中。
设置约50%的内存给ES，如Xms=Xmx=24G
对master，由于bulk请求涉及频繁gc，设置为如下的G1GC。

-XX:+UseG1GC
将所有master 的ip地址更新到discovery file插件配置中。

node.attr.rack: crm-master-01
node.name: golden-master01
network.publish_host: 10.27.1.1

cluster.name : goldeneye
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300

path:
data:
- /home/data1
- /home/data2

bootstrap.memory_lock: true
discovery.zen.minimum_master_nodes: 2
action.destructive_requires_name: true
script.max_compilations_per_minute: 120
indices.query.bool.max_clause_count: 4096

##===================MASTER NODE =========================
node.master: true
node.data: false
node.ingest: true
discovery.zen.ping_timeout: 10s

xpack.monitoring.history.duration: 1d
xpack.graph.enabled: false
xpack.watcher.enabled: false
#xpack.security.enabled: false
xpack.security.dls_fls.enabled: false
xpack.security.transport.filter.enabled: false

##===================DATA NODE =========================
node.master: false
node.data: true
node.ingest: false

http.enabled: false

xpack.security.enabled: true
xpack.monitoring.enabled: true
xpack.graph.enabled: false
xpack.watcher.enabled: false

集群settings

curl -XPUT localhost:9200/_cluster/settings -d '

{
"transient": {
"cluster": {
      "routing": {
          "rebalance": {
              "enable": "none" #初始导入前，禁用shard移动
          },
      "allocation": {
          "node_concurrent_incoming_recoveries": "6", #初始导入后，将replica设为1时，加速replica复制速度。默认为2
          "node_concurrent_outgoing_recoveries": "6"
      }
    }
},
"indices": {
  "recovery": {
      "max_bytes_per_sec": "500mb" #初始导入后，将replica设为1时，加速replica复制速度，默认为40m
      },
  "store": {
      "throttle": {
          "type": "merge",
          "max_bytes_per_sec": "500m"   #数据导入时，及_forcemerge?max_num_segments=3时加速
          }
      }
  }
}
}

创建库Template

curl -XPUT localhost:9200/_template/crm_v5 -d '

{
"template": "crm*",
  "aliases": {
      "crm_v5.0": {}
  },
  "settings": {
  "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1", #初始化导入时设置 0，
"refresh_interval": "10s",    #初始化导入时设置 -1
"translog.durability":"async",    #大量写入场景

"translog.flush_threshold_size":"1024m" #默认512m
"max_result_window": "5000000",
"max_rescore_window":"20000",
"unassigned.node_left.delayed_timeout": "10m",
"search.slowlog.threshold.query.warn":"30s",
"indexing.slowlog.threshold.index.warn": "10s"
}
},
"mappings": {
"customer": {
"_all": { "enabled": false }, #内存优化
"properties": {

字段使用尽量短的类型：能short就不要使用int，能half 就不要使用float。

如long，integer，short， byte

float， half_float

"end_time": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis" #两种格式2016-0-01T00:00:00 +08:00
}

启动全量导入作业

更新template的 number_of_replicas=0，refresh_interval=-1

全量导入完成

1. 确认已导入的库大小，对大于30GB的库，重新设置shard个数number_of_shards,使每个shard约为30GB。使用 _reindex 和_alias 接口配合完成。

2. 合并每个shard的segment数目，增强search速度。

curl -XPOST /crm*/_forcemerge?max_num_segments=3

3. 启动副本备份。 number_of_replicas=1

curl -XPUT 'localhost:9200/crm*/_settings' -d ' { "index" : { "number_of_replicas" : 1 } }'

运维期间

A：移动Index所属节点

curl -XPUT localhost:9200/_cluster/reroute   迁移节点数据

{
"commands" : [ {
"move" :
{
"index" : "crm-0720", "shard" : 0,
"from_node" : "data02", "to_node" : "data03"
}
}]
}

如果关闭一个节点，则执行reroute前，还必须禁用系统自动的shard分配

curl -XPUT localhost:9200/_cluster/settings -d '{  "transient" : {  "cluster.routing.allocation.enable" : "none"  }  }' #默认为all

PUT /_cluster/settings
{
"persistent" : {
"cluster.routing.allocation.enable":"none",
"cluster.routing.rebalance.enable":"replicas",
"action.destructive_requires_name":true
}
}

当前文件夹内查找关键字,显示所在文件,所在行,行内容:
grep -nHIrF KeyWord ./