elasticsearch关于索引切分的实现

【背景信息】

ES一直以来对于已经创建好的索引的分片是不可以进行分割的，简单的说，当你创建了一个索引，并指定了number_of_shards为2，当随着数据量的不断增大，是无法将索引的shard扩充为4个或者8个的，当然，你可以通过重新创建索引，这个的前提是你的数据关联性并不大，业务上允许出现多个索引存在的场景。

在ES6.1版本之后，支持了索引shard的切分，与其说是支持了切分，不如说是提供了一个接口，将原有的数据可以快速复制到新的索引下，并保持数据结构的不变，仅仅是增加了索引分片。

【使用前提】

使用该功能的前提是ES版本必须升级至6.1之后的版本。
集群状态为green。
磁盘空间允许复制一份新的索引数据。
在使用前，索引配置中必须配置number_of_routing_shards。
重新分片后的索引是不存在的
重新分配后的shard数必须是number_of_routing_shards的因数，同时是number_of_shards的倍数，简单说，如果指定了number_of_routing_shards为10，number_of_shards为2，则你的增加shard的情况就有了2 → 10 (split by 5)

【功能验证】

首先，创建索引test_split_index，并指定number_of_shards为2，number_of_routing_shards为10，由于单节点集群，因此指定number_of_replicas为0，保证集群状态为green。

curl -XPUT localhost:9200/test_split_index -H 'Content-Type: application/json' -d '
{
  "settings": {
        "index.number_of_shards" : 2,
        "index.number_of_routing_shards" : 10,
        "index.number_of_replicas": 0
    }
}
'

插入数据

curl -XPOST localhost:9200/test_split_index/split_index/_bulk?pretty -H 'Content-Type: application/json' -d '
{ "index": {}}
{  "user":"zhangsan",  "age":"12"}
{ "index": {}}
{  "user":"lisi",  "age":"25"}
{ "index": {}}
{  "user":"wangwu",  "age":"21"}
{ "index": {}}
{  "user":"zhaoliu",  "age":"16"}
{ "index": {}}
{  "user":"sunjiu",  "age":"40"}
'

由于在切分过程中，避免有数据写入，因此，需要先关闭写数据的写入。

关闭索引
curl -XPOST localhost:9200/test_split_index/_close

防止在切分过程中有数据写入
curl -XPUT 'localhost:9200/test_split_index/_settings?pretty' -H 'Content-Type: application/json' -d'
{
"settings": {
"index.blocks.write": true
}
}
'

打开索引
curl -XPOST localhost:9200/test_split_index/_open

进行数据的shard的切分。

curl -XPOST 'localhost:9200/test_split_index/_split/split_index_target?pretty' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 10
  }
}
'

你就会发现在数据目录下，多出了一个新的索引，通过查询数据，和原索引下的数据是一致的。

参考链接：

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/indices-split-index.html