es-04-mapping和setting的建立

mapping和setting, 使用java客户端比较难组装, 可以使用python或者scala

这儿直接在kibana中进行DSL创建

1, mapping

创建索引的时候, 可以事先对数据进行定义, 告诉es如果索引数据并被搜索

实际上, es会猜测原数据并判断, 但对一些特俗的字段, 需要指定

类型

类型: text, keyword(弃用)

数据: long, integer, short, byte, double, float

日期: date

bool类型: boolean

binary: binary

复杂类型: object (内置对象, dict), nested (把object 放在数组中)

geo类型, geo-point, geo-shape

专业: ip, competion

mapping中, 新的数据类型相比之前的发生了变化, keyword类型被弃用(v5.x)

a simple type like text, keyword, date, long, double, boolean or ip.
- a type which supports the hierarchical nature of JSON such as object or nested.
- or a specialised type like geo_point, geo_shape, or completion.

属性:

store:            是否存储, 适合all    
index,            是否分析, 适合string
null_value:    字段为空, 可设置默认值 NA, 搜索时可以搜搜, 适合all
analyzer:      分词器, 默认 standard, 一般设置 ik。    适合。all
include_in_all:     默认es对每个文档设置一个, 让每个字段被搜索到, 如果不想搜索到,
                    就可以设置false
format:     格式化

1, 创建index

PUT test
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "type1" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}

mapping 和 type 必须那样写

2, 删除

DELETE /twitter

3, 查看

GET /twitter

4, exists

HEAD twitter

5, 索引开关

关闭后的索引, 节省资源, 仅仅维持原数据, 不进行读写操作, 还可以在mapping更新的时候进行

POST /my_index/_close

POST /my_index/_open

6, 减少索引

注意: 素数只能缩减为素数

1), 创建一个新的包含更少分片的索引

2), 将segement 从 source index 硬连接到 target source

3) 恢复索引

PUT /my_source_index/_settings
{
  "settings": {
    "index.routing.allocation.require._name": "shrink_node_name", 
    "index.blocks.write": true 
  }
}

POST my_source_index/_shrink/my_target_index

7, split, 扩大分片

过程和减少类似, 对于已有的数据, 可修改副本, 不可修改分片, 因为数据位置需要分片数来确定, 一旦修改, 之前的就无效了

a) 准备用于切分的库

PUT my_source_index
{
    "settings": {
        "index.number_of_shards" : 1,
        "index.number_of_routing_shards" : 2 
    }
}

b), 切分

POST my_source_index/_split/my_target_index
{
  "settings": {
    "index.number_of_shards": 2
  }
}

c), 创建硬连接

POST my_source_index/_split/my_target_index
{
  "settings": {
    "index.number_of_shards": 5 
  },
  "aliases": {
    "my_search_indices": {}
  }
}

8, 滚动索引:

rollover index 当原索引太旧或者太老的时候, 可以滚动到新的索引上

9, put mapping

put mapping可以对一个已有的index添加字段等

PUT my_index 
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "properties": {
            "first": {
              "type": "text"
            }
          }
        },
        "user_id": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT my_index/_mapping/_doc
{
  "properties": {
    "name": {
      "properties": {
        "last": { 
          "type": "text"
        }
      }
    },
    "user_id": {
      "type": "keyword",
      "ignore_above": 100 
    }
  }
}

10, get mapping

GET /twitter/_mapping/_doc

11, 查看某一个字段的属性

准备mapping

PUT publications
{
　　"mappings": {
        "_doc": {
            "properties": {
                "id": { "type": "text" },
                "title":  { "type": "text"},
                "abstract": { "type": "text"},
                "author": {
                    "properties": {
                        "id": { "type": "text" },
                        "name": { "type": "text" }
                    }
                }
            }
        }
    }
}

GET publications/_mapping/_doc/field/title

或者使用通配符的方式

GET publications/_mapping/_doc/field/a*

12 type exists

HEAD twitter/_mapping/tweet

13 为索引设置别名

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test1", "alias" : "alias1" } }
    ]
}

删除

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } }
    ]
}

为多个索引创建同一个别名(感觉像联合索引)

POST /_aliases
{
    "actions" : [
        { "add" : { "indices" : ["test1", "test2"], "alias" : "alias1" } }
    ]
}

14 更新索引

POST /twitter/_close

PUT /twitter/_settings
{
  "analysis" : {
    "analyzer":{
      "content":{
        "type":"custom",
        "tokenizer":"whitespace"
      }
    }
  }
}

POST /twitter/_open

15, get 索引

GET /twitter,kimchy/_settings

16 analyze, 进行分词预计

GET _analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}

explain

GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] 
}

一个完整的mapping的示例

put macsearch_fileds
{
  "settings": {
    "number_of_shards": "3",
    "number_of_replicas": "1"
  },
  "mappings": {
    "mac": {
      "dynamic": "true",
      "properties": {
        "app_name": {
          "type":     "keyword",
          "index_options": "freqs"
        },
        "content": {
          "type": "text",
          "index_options": "offsets"
        },
        "current_time": {
          "type": "long"
        },
        "mac": {
          “store”: “false”,
          "type": "keyword", 
          "index_options": "freqs"
        },
        "server_time": {
          "type": "long"
        },
        "time": {
          "type": "text", 
          "index_options": "freqs"
        },
        "topic": {
　　　　　　“stroe”: “true”,
          "type": "keyword", 
          "index_options": "freqs",
          “analyzer”: “ik_max_word”

        }
      }
    }
  }
}

dynamic取值:

　　true：默认值，动态添加字段；

false：忽略新字段；

strict：碰到陌生字段，抛出异常。

index_options :

index_options 参数用于控制增加到倒排索引的信息，为了搜索和高亮。它可以接受如下设置：

docs: 只索引文档号。可以用于回答词项是否存在于文档中的这个域。
freqs: 文档号和词频都会被存储. 词项频率越高积分越高。
positions: 文档号，词项，还有词的位置被索引。位置可以用于模糊或者短语查询。
offsets: 文档号，词项，词的位置，和开始到结束的字符偏移（词项映射到原来的字符串）被索引。偏移提供postings highlighter。

分析字符串域默认是会使用positions，其他域默认使用docs。

如果有的字段只想索引, 不想存储, 可以使用 _source

put security_2
{
  "settings": {
    "number_of_shards": "5",
    "number_of_replicas": "1"
  },
  "mappings": {
    "push": {
      "dynamic": "true",
      "_source": {
        "excludes": ["AesPhoneNum", "AesEmail"]
      },
      "properties": {
        "AesPhoneNum": {
          "type": "keyword",
          "store": "false", 
          "index_options": "freqs"
        },
        "AesEmail": {
          "type": "keyword",
          "store": "false", 
          "index_options": "freqs"
        }
      }
    }
  }
}

对于有数据的需要更改的mapping

close index
post mapping
open index

这样可以保证原数据不丢, 但执行过程中会丢掉执行过程的1-2s的数据

2, setting

1), 可以对一个正在运行的集群进行扩容

将原 1 个副本, 扩大为2个副本

PUT /blogs/_settings
{
   "number_of_replicas" : 2
}

但主分片的数量无法更改, 因为分片的位置需要分片数量来确定, 如果更改, 那么之前存储的数据将无效

所以不允许修改

3, ik分词器

只需要在需要安装的位置, 进行添加ik分词即可

put macsearch_fileds
{
  "settings": {
    "number_of_shards": "3",
    "number_of_replicas": "1"
  },
  "mappings": {
    "mac": {
      "dynamic": "true",
      "properties": {
        "app_name": {
          "type":     "keyword",
          "index_options": "freqs"
        },
        "topic": {
          "type": "keyword", 
          "index_options": "freqs",
          “analyzer”: “ik_max_word”

        }
      }
    }
  }
}

所有的分词器, 如果有index属性的话, 做分词的时候会进行大小写转换,

而term在查询的时候, 会原样查询, 所以如果有大写可能会匹配不到