es

Es 内置分词器

Standard Analyer 默认分词器，按词切分，小写处理
Simple Analyer 按照非字母切分（符号被过滤），小写处理
Stop Analyer 小写处理,停用过滤词（the, is , a）
Whitespace Analyer 按照空格切分，不转小写
Keyword Analyer 不分词，直接将输入当作输出
Pattern Analyer 正则表达式，默认 W+(非字符分隔)
Language 提供30种分词器
Customer Analyzer 自定义分词器

Standard Analyer 默认分词器

按词切分，小写处理

GET /_analyze
{
  "analyzer": "standard",
  "text": "Trying Out Kibana! "
}

结果
{
  "tokens" : [
    {
      "token" : "trying",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "out",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "kibana",
      "start_offset" : 11,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

Simple Analyer

按照非字母切分（符号被过滤），小写处理

GET /_analyze
{
  "analyzer": "simple",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}

结果
{
  "tokens" : [
    {
      "token" : "try",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ing",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "kib",
      "start_offset" : 21,
      "end_offset" : 24,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ana",
      "start_offset" : 26,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}

Simple Analyer

按照非字母切分（符号被过滤），小写处理

GET /_analyze
{
  "analyzer": "stop",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}


结果

{
  "tokens" : [
    {
      "token" : "try",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ing",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "kib",
      "start_offset" : 21,
      "end_offset" : 24,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ana",
      "start_offset" : 26,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}

Whitespace Analyer

按照空格切分，不转小写

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}

结果
{
  "tokens" : [
    {
      "token" : "Try78ing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "12",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "Out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "1212",
      "start_offset" : 16,
      "end_offset" : 20,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "Kib45ana!",
      "start_offset" : 21,
      "end_offset" : 30,
      "type" : "word",
      "position" : 4
    }
  ]
}

Keyword Analyer

不分词，直接将输入当作输出

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
  "tokens" : [
    {
      "token" : "Try78ing 12 Out 1212 Kib45ana! ",
      "start_offset" : 0,
      "end_offset" : 31,
      "type" : "word",
      "position" : 0
    }
  ]
}

Pattern Analyer

正则表达式，默认 W+(非字符分隔)

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
  "tokens" : [
    {
      "token" : "try78ing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "12",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "1212",
      "start_offset" : 16,
      "end_offset" : 20,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "kib45ana",
      "start_offset" : 21,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}

Language 提供30种分词器

Customer Analyzer

自定义分词器