5分钟学用Lucene

5分钟学用Lucene

Lucene很容易使你的应用程序添加文本搜索的功能,实际上,非常容易,我将在5分钟内向您展示!
( 译者注: 实际上,在此之前需要理解搜索引擎的工作原理,和Lucene的基本概念 )

1. 索引.
   为了这个简单的例子,我们将建立一个存储在内存中的一些字符串的索引。

Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);

IndexWriter w = new IndexWriter(index, config);
addDoc(w, "Lucene in Action");
addDoc(w, "Lucene for Dummies");
addDoc(w, "Managing Gigabytes");
addDoc(w, "The Art of Computer Science");
w.close();


2. 查询.
   读取从标准输入(stdin)输入的查询,解析,并从中建立lucence的查询.

String querystr = args.length > 0 ? args[0] : "lucene";
Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);


 

3. 搜索.
   通过使用Query来创建一个Searcher来搜索索引, 然后实例化一个 TopScoreDocCollector 来收集前10个Hits (译者注: 查询结果)

int hitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;


 

4. 显示.
   现在我们已经得到了搜索的结果,显示出来即可.

System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
    int docId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println((i + 1) + ". " + d.get("title"));
}


 


完整的代码如下: 
(译者注: 在最新的Lucene 3.6版本上面做了修正 )

package hellolucene;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

import java.io.IOException;


public class HelloLucene {
  public static void main(String[] args) throws IOException, ParseException {
    // 0. Specify the analyzer for tokenizing text.
    //    The same analyzer should be used for indexing and searching
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);

    // 1. create the index
    Directory index = new RAMDirectory();

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);

    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();

    // 2. query
    String querystr = args.length > 0 ? args[0] : "lucene";

    // the "title" arg specifies the default field to use
    // when no field is explicitly specified in the query.
    Query q = new QueryParser(Version.LUCENE_36, "title", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 10;
    IndexReader reader = IndexReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("title"));
    }

    // searcher can only be closed when there
    // is no need to access the documents any more.
    searcher.close();
  }

  private static void addDoc(IndexWriter w, String value) throws IOException {
    Document doc = new Document();
    doc.add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
  }
}

输出结果:

Found 2 hits.
1. Lucene in Action
2. Lucene for Dummies

参考原文: http://www.lucenetutorial.com/lucene-in-5-minutes.html

原文地址:https://www.cnblogs.com/fdyang/p/2858725.html