es集群规划及优化

很多问题也没有考虑的很规范，对elastic产品也只停留的很浅的理解，不足之处望理解并指出，相互交流学习。

es7.1.1环境搭建

es7版本自带jdk环境，不需要再重装

创建es用户

# 添加用户组
groupadd es

# 添加用户
useradd -m -g es es

#配置密码
passwd es

时间同步

yum install -y ntp 
systemctl enable ntpd && systemctl start ntpd
timedatectl set-timezone Asia/Shanghai
timedatectl set-ntp yes
ntpq -p

sudo权限

# 在最后增加
elsearch ALL=(ALL)  NOPASSWD:ALL

下载 & 解压

# 下载
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.1.1-linux-x86_64.tar.gz

#解压
tar xvf elasticsearch-7.1.1-linux-x86_64.tar.gz

系统配置

swapoff -a

cat >> /etc/sysctl.conf <<EOF
fs.file-max=655360
vm.max_map_count = 262144
EOF

vim /etc/security/limits.conf
* soft nproc 20480
* hard nproc 20480
* soft nofile 65536
* hard nofile 65536
* soft memlock unlimited
* hard memlock unlimited

vim /etc/sysctl.conf
新增vm.max_map_count = 655360
执行sysctl –p

vim /etc/security/limits.d/20-nproc.conf
* soft nproc 20480

启动服务 & 配置jvm

# vim jvm.options
-Xms20g
-Xmx20g

# elasticsearch.yml基础配置

cluster.name：es-test
# 配置es集群名称，相同名称的集群会自动识别
node.name: node-1
# es7.0集群节点名称会自动获取本机hostname，如果不是多实例部署，可不配置该项
path.data: /data/es/data
# 指定数据存放目录，多目录逗号分隔
path.logs: /data/es/logs
# 指定日志存放目录
network.host: 0.0.0.0
# 指定本机ip地址
http.port: 9200
# 指定http协议端口 ，多实例部署时需要修改
transport.tcp.port
# 指定tcp协议端口，多实例部署时需要修改
cluster.initial_master_nodes: ["node-1"]
# 指定主节点列表，需要在每个节点上配置该参数
discovery.zen.ping.unicast.hosts: []
# 广播节点

#启动服务(需要通过前面创建的普通用户启动，同时需要注意相关配置文件权限)
./bin/elasticsearch -d 				#放到后台启动

es集群及参数优化

系统层面

# 系统层面的调优主要是内存的设定与避免交换内存。
swapoff -a   #禁用swapping，开启服务器虚拟内存交换功能会对es产生致命的打击

# jvm内存配置
# jvm.options主要是进行内存相关配置，官方建议分配给es的内存不要超出系统内存的50%，预留一半给Lucene，因为Lucene会缓存segment数据提升检索性能；内存配置不要超过32g，如果你的服务器内存没有远远超过64g，那么不建议将es的jvm内存设置为32g，因为超过32g后每个jvm对象指针的长度会翻倍，导致内存与cpu的开销增大。（机器配置32G内存服务器节点上也基本不跑其他服务，建议配置16G）
-Xms16g
-Xmx16g

分片和副本、索引

# 分片（shard）索引通常都会分解成不同部分, 分布在不同节点的部分数据就是分片。7版本以后es默认分片数为1，由于创建索引分片后不能更改，这里需要充分考虑后期数据量增涨情况合理规划分片数，分片数对性能会造成很大影响。
（建议：分片数确认需要根据业务量和机器节点分配）
# 副本数可以动态调整，前期保障数据容灾先设置1个副本数，后期可根据集群压测实际业务量、存储空间等原因调整

# 索引（index）7版本的es默认最大索引数1000个，合理的规划索引及保障了性能也方便后期的管理，
（建议如果每天的数据量很大，则可以按天创建索引，如果是一个月积累起来导致数据量很大，则可以一个月创建一个索引。）

参数调优

bootstrap.memory_lock: true
#设置为true锁住内存，当服务混合部署了多个组件及服务时，应开启此操作，允许es占用足够多的内存。

index.merge.scheduler.max_thread_count: 1 
# 索引 merge 最大线程数设置为 1 个，该参数可以有效调节写入的性能，配置线程数越多对磁盘io消耗就越大（SSD忽略）。

index.translog.durability:async # 这个可以异步写硬盘，增大写的速度

discovery.zen.fd.ping_timeout:120s # ping 超时时间

discovery.zen.fd.ping_interval:120s	 # 节点检测时间

index.refresh_interval:300s #index 刷新间隔

indices.requests.cache.size：2%
#查询request请求的DSL语句缓存，被缓存的DSL语句下次请求时不会被二次解析，可提升检索性能，默认值是1%。
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  "index.merge.scheduler.max_thread_count" : "1",
  "index.refresh_interval" : "300s",
  "index.translog.durability" : "async"
}'

性能压测数据对比

elasticsearch单节点 version:7.1.1

节点配置： 8core 32G

压测工具： esrally1.3.0

部署配置参考文档：https://www.jianshu.com/p/c89975b50447

官方： https://esrally.readthedocs.io/en/latest/install.html

调优参数：

1. swapoff -a
2. jvm  20g
3. bootstrap.memory_lock: true
	 index.merge.scheduler.max_thread_count: 1

压测结果
```
官方使用两台服务器进行压测，一台运行 esrally ，一台运行 es，服务器的配置如下：
CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
RAM: 32 GB
SSD: Crucial MX200
OS: Linux Kernel version 4.8.0-53
JVM: Oracle JDK 1.8.0_131-b11

压测结果：
写性能 平均12w.5
读性能  match-all 122.61ms, term 23.07ms, range 543.2ms, aggs 2905.16ms, scroll 1111.57ms
```
- 写性能，平均在9w
  
  Min Throughput docs/s |分钟的吞吐量
  
  Median Throughput docs/s |平均吞吐量
  
  Max Throughput docs/s |最大吞吐量
  
  Min Throughput index-append 76453.2 docs/s
  
  Median Throughput index-append 90592.4 docs/s
  
  Max Throughput index-append 124138 docs/s
- 读性能
  
  Min Throughput ops/s |每秒完成term操作次数
  
  Median Throughput ops/s |平均完成term操作次数
  
  Max Throughput ops/s |最大完成term操作次数
  
  Min Throughput term 6.56 ops/s
  
  Median Throughput term 6.56 ops/s
  
  Max Throughput term 6.56 ops/s
  
  Min Throughput range 1 ops/s
  
  Median Throughput range 1 ops/s
  
  Max Throughput range 1 ops/s
- 出错率（主要在写入数据方面存在数据丢失问题）
  
  error rate index-append 0.07 %
添加优化参数后压测(再上一次基础上加如下参数)，6以后无法通过es的yml文件直接添加参数，需要通过API添加
```
curl -H "Content-Type: application/json" -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  "index.merge.scheduler.max_thread_count" : "1",
  "index.refresh_interval" : "300s",
  "index.translog.durability" : "async"
}'

discovery.zen.fd.ping_timeout:120s
discovery.zen.fd.ping_timeout:120s
```
- 写性能，平均也在9w，没有明显的变化
  
  Min Throughput index-append 73367.9 docs/s
  
  Median Throughput index-append 89433.4 docs/s
  
  Max Throughput index-append 118648 docs/s
- 读性能，term操作有较为明显增长
  
  Min Throughput term 27 docs/s
  
  Median Throughput term 27 docs/s
  
  Max Throughput term 27 docs/s
  
  Min Throughput range 1 docs/s
  
  Median Throughput range 1 docs/s
  
  Max Throughput range 1 docs/s
- 出错率
  
  error rate index-append 0.08 %
增加副本数为1
```
curl -H "Content-Type: application/json" -XPUT 'http://127.0.0.1:9200/_all/_settings' -d '{
    "index": {
       "number_of_replicas": "1"
    }
}'
```
- 写性能,由于增加副本数，导致写速度微量变慢，8.5w左右
  
  Min Throughput index-append 73240.3 docs/s
  
  Median Throughput index-append 84554.1 docs/s
  
  Max Throughput index-append 121398 docs/s
- 读性能，相对速率下降一半
  
  Min Throughput term 10.68 docs/s
  
  Median Throughput term 10.68 docs/s
  
  Max Throughput term 10.68 docs/s
  
  Min Throughput range 1 docs/s
  
  Median Throughput range 1 docs/s
  
  Max Throughput range 1 docs/s
- 出错率，出错率下降
  
  error rate index-append 0.05 %

Min Throughput	index-append	76453.2	docs/s
Median Throughput	index-append	90592.4	docs/s
Max Throughput	index-append	124138	docs/s

Min Throughput	index-append	73367.9	docs/s
Median Throughput	index-append	89433.4	docs/s
Max Throughput	index-append	118648	docs/s

Min Throughput	index-append	73240.3	docs/s
Median Throughput	index-append	84554.1	docs/s
Max Throughput	index-append	121398	docs/s

Min Throughput	term	10.68	docs/s
Median Throughput	term	10.68	docs/s
Max Throughput	term	10.68	docs/s

Min Throughput	range	1	ops/s
Median Throughput	range	1	ops/s
Max Throughput	range	1	ops/s