es 5.0 拼音分词器 mac

安装方法和ik中文分词器一样,

先下载:

https://github.com/medcl/elasticsearch-analysis-pinyin

执行:

mvn package;

打包成功以后,会生成一个target文件夹,在elasticsearch-analysis-ik-master/target/releases目录下,找到elasticsearch-analysis-ik-5.1.1.zip,这就是我们需要的安装文件。解压elasticsearch-analysis-ik-5.1.1.zip,得到下面内容:

如果mvn 有问题的话,可以将其导入eclipse中,进行maven clean ,maven install 

知道这个文件夹,将其拷贝出来,并解压,也可以.

将其放到es安装目录下,文件路径为:



 重启es;

测试:

 

 中文与拼音结合测试:

IK+pinyin分词配置

5.1创建索引与分析器设置

创建一个索引,并设置index分析器相关属性:

文档1:
curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
文档2:

curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'

5.3测试(1)拼音分词

下面四条命命令都可以匹配”刘德华”:

1,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"

2,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de"

3,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua"

4,curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"

5.4测试(2)IK分词测试

curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
{
  "query": {
    "match": {
      "name.pinyin": "国歌"
    }
  },
  "highlight": {
    "fields": {
      "name.pinyin": {}
    }
  }
}'
结果如下:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 16.698704,
    "hits" : [
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "tina",
        "_score" : 16.698704,
        "_source" : {
          "name" : "中华人民共和国国歌"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>中华人民共和国</em><em>国歌</em>"
          ]
        }
      }
    ]
  }
}

5.3测试(4)pinyin+ik分词测试:

curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
{
  "query": {
    "match": {
      "name.pinyin": "zhonghua"
    }
  },
  "highlight": {
    "fields": {
      "name.pinyin": {}
    }
  }
}'
结果如下


{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 5.9814634,
    "hits" : [
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "tina",
        "_score" : 5.9814634,
        "_source" : {
          "name" : "中华人民共和国国歌"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>中华人民共和国</em>国歌"
          ]
        }
      },
      {
        "_index" : "medcl",
        "_type" : "folks",
        "_id" : "andy",
        "_score" : 2.2534127,
        "_source" : {
          "name" : "刘德华"
        },
        "highlight" : {
          "name.pinyin" : [
            "<em>刘德华</em>"
          ]
        }
      }
    ]
  }
}

参考文献:

https://github.com/medcl/elasticsearch-analysis-pinyin

http://blog.csdn.net/napoay/article/details/53907921

原文地址:https://www.cnblogs.com/wangchuanfu/p/7239269.html