es

Es 内置分词器

  • Standard Analyer 默认分词器,按词切分,小写处理
  • Simple Analyer 按照非字母切分(符号被过滤),小写处理
  • Stop Analyer 小写处理,停用过滤词(the, is , a)
  • Whitespace Analyer 按照空格切分,不转小写
  • Keyword Analyer 不分词,直接将输入当作输出
  • Pattern Analyer 正则表达式,默认 W+(非字符分隔)
  • Language 提供30种分词器
  • Customer Analyzer 自定义分词器

Standard Analyer 默认分词器

按词切分,小写处理

GET /_analyze
{
  "analyzer": "standard",
  "text": "Trying Out Kibana! "
}

结果
{
  "tokens" : [
    {
      "token" : "trying",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "out",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "kibana",
      "start_offset" : 11,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

Simple Analyer

按照非字母切分(符号被过滤),小写处理

GET /_analyze
{
  "analyzer": "simple",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}

结果
{
  "tokens" : [
    {
      "token" : "try",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ing",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "kib",
      "start_offset" : 21,
      "end_offset" : 24,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ana",
      "start_offset" : 26,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}

Simple Analyer

按照非字母切分(符号被过滤),小写处理

GET /_analyze
{
  "analyzer": "stop",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}


结果

{
  "tokens" : [
    {
      "token" : "try",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "ing",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "kib",
      "start_offset" : 21,
      "end_offset" : 24,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ana",
      "start_offset" : 26,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}

Whitespace Analyer

按照空格切分,不转小写

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}

结果
{
  "tokens" : [
    {
      "token" : "Try78ing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "12",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "Out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "1212",
      "start_offset" : 16,
      "end_offset" : 20,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "Kib45ana!",
      "start_offset" : 21,
      "end_offset" : 30,
      "type" : "word",
      "position" : 4
    }
  ]
}


Keyword Analyer

不分词,直接将输入当作输出

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
  "tokens" : [
    {
      "token" : "Try78ing 12 Out 1212 Kib45ana! ",
      "start_offset" : 0,
      "end_offset" : 31,
      "type" : "word",
      "position" : 0
    }
  ]
}

Pattern Analyer

正则表达式,默认 W+(非字符分隔)

GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Try78ing 12 Out 1212 Kib45ana! "
}
结果
{
  "tokens" : [
    {
      "token" : "try78ing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "12",
      "start_offset" : 9,
      "end_offset" : 11,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "out",
      "start_offset" : 12,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "1212",
      "start_offset" : 16,
      "end_offset" : 20,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "kib45ana",
      "start_offset" : 21,
      "end_offset" : 29,
      "type" : "word",
      "position" : 4
    }
  ]
}


Language 提供30种分词器

Customer Analyzer

自定义分词器

原文地址:https://www.cnblogs.com/smallyi/p/13430614.html