openNLP 名称搜索

功能介绍：名称查找器可检测文本命名实体和数字。为了能够检测实体名称搜索需要的模型。该模型是依赖于语言和实体类型这是训练。所述OpenNLP项目提供了许多这些各种免费提供的语料库训练有素预训练名取景模式。他们可以在我们的模型下载页进行下载。要查找原始文本的文本必须分割成符号和句子的名字。详细描述中的一句话探测器和标记生成器教程中给出。其重要的，对于训练数据和输入的文本的标记化是相同的。根据不同的模型可以查找人名、地名等实体名。

API：从应用程序中训练名字发现者的建议使用培训API而不是命令行工具。三个基本步骤是必要的训练它：

应用程序必须打开一个示例数据流
调用NameFinderME.train方法
保存TokenNameFinderModel到文件或数据库

代码实现

package package01;

import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class Test04 {

    public static void main(String[] args) {
        try {
            Test04.findName();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 3.名称搜索:Name Finder
     * @deprecated By its name, name finder just finds names in the context. Check out the following example to see what name finder can do. It accepts an array of strings, and find the names inside.
     */
    public static void findName() throws IOException {
        InputStream is = new FileInputStream("E:\NLP_Practics\models\en-ner-person.bin");
        TokenNameFinderModel model = new TokenNameFinderModel(is);
        is.close();
        NameFinderME nameFinder = new NameFinderME(model);
        String[] sentence = new String[]{
                "Mike",
                "Tom",
                "Smith",
                "is",
                "a",
                "good",
                "person"
        };
        Span nameSpans[] = nameFinder.find(sentence);
        for(Span s: nameSpans)
            System.out.println(s.toString());
        System.out.println("--------------3-------------");
    }

}

结果

[0..1) person
[1..3) person
--------------3-------------

https://github.com/godmaybelieve