Lucene2.9.2 + 盘古分词2.3.1（一）入门：建立简单索引，搜索（原创）

有图有真相

ps：上图可以看到中文分词成功，搜索也命中了；

说明：如果想好好学Lucene建议看Lucene in action 2nd version，另外2.9.2中对以前很多方法已经废弃，旧代码就别看了；

下面是代码：

建立索引
 public static void IndexFile(this IndexWriter writer, IO.FileInfo file)
{
    var watch = new Stopwatch();
    var startTime = DateTime.Now;
    watch.Start();
    Console.WriteLine("Indexing  {0}", file.Name);
    writer.AddDocument(file.GetDocument());
    watch.Stop();
    var timeSpan = DateTime.Now - startTime;
    Console.WriteLine("Indexing Completed! Cost time {0}[{1}]", timeSpan.ToString("c"), watch.ElapsedMilliseconds);
 
  }
 
public static Document GetDocument(this IO.FileInfo file)
{
    var doc = new Document();
    doc.Add(new Field("contents", new IO.StreamReader(file.FullName)));
    doc.Add(new Field("filename", file.Name,
    Field.Store.YES, Field.Index.ANALYZED));
    doc.Add(new Field("fullpath", file.FullName,
    Field.Store.YES, Field.Index.NOT_ANALYZED));
    return doc;
}
 

Output

Indexing Scott.txt
Indexing Completed! Cost time 00:00:02.4231386[2423]
Indexing 黄金瞳.txt
Indexing Completed! Cost time 00:00:00.0860049[85]
There are 2 doc Indexed!
Index Exit!

代码解释：

第14行 GetDocument 建立相应的doc,doc是Lucene核心对象之一，下面是它的定义：

The Document class represents a collection of fields. Think of it as a virtual document—
a chunk of data, such as a web page, an email message, or a text file—that you
want to make retrievable at a later time. Fields of a document represent the document
or metadata associated with that document. The original source (such as a database
record, a Microsoft Word document, a chapter from a book, and so on) of
document data is irrelevant to Lucene. It’s the text that you extract from such binary
documents, and add as a Field instance, that Lucene processes. The metadata (such
as author, title, subject and date modified) is indexed and stored separately as fields
of a document.

不关心的同学可以将它理解为数据库里表的一条记录，最后查询出结果后得到的也是doc对象，也就是一条记录；

那么建立索引就是添加很多记录到lucene里；

第19行第一个参数就不解释了，第二个参数NOT_ANALYZED并不是就搜不到这个字段而是作为整个字段搜索，不分词而已；

搜索
 public ActionResult Index(string keyWord)
        {
            var originalKeyWords = keyWord;
            ViewBag.TotalResult = 0;
            ViewBag.Results = new List<KeyValuePair<string, string>>();
            if (string.IsNullOrEmpty(keyWord))
            { ViewBag.Message = "Welcome Today!"; return View("Index"); }
 
            var q = keyWord;
 
            var search = new IndexSearcher(_indexDir, true);
           // q = GetKeyWordsSplitBySpace(q, new PanGuTokenizer());
 
            var queryParser =  new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "contents", new PanGuAnalyzer(false));
            var query = queryParser.Parse(q);
            var hits = search.Search(query, 100); //search.Search(bq, 100);
 
            var recCount = hits.totalHits;
            ViewBag.TotalResult = recCount;
            
            //show explain
            for (int d = 0; d < search.MaxDoc(); d++)
            {
                ViewBag.Explain += search.Explain(query, d).ToHtml();
 
                var termReader=search.GetIndexReader().Terms();
                ViewBag.Explain += "<ul >";
                do
                {
                    if(termReader.Term()!=null)
                    ViewBag.Explain += string.Format("<li>{0}</li>", termReader.Term().Text());
                } while (termReader.Next());
                ViewBag.Explain += "</ul>";
            }
 
            foreach (var hit in hits.scoreDocs)
            {
                try
                {
                    var doc = search.Doc(hit.doc);
                    var fileName = doc.Get("filename");
                    // fileName = highlighter.GetBestFragment(originalKeyWords, fileName);
                    //var contents = GetBestFragment(originalKeyWords, new StreamReader(doc.Get("fullpath"), Encoding.GetEncoding("gb2312")));
                    (ViewBag.Results as List<KeyValuePair<string, string>>)
                        .Add(new KeyValuePair<string, string>(fileName, string.Empty));
                }
                catch (Exception exc)
                {
                    Response.Write(exc.Message);
                    throw;
                }
 
            }
 
            search.Close();
 
            ViewBag.Message = string.Format("????{0}", keyWord);
            return View("Index");
        }
 

后续文章会继续贴这些代码，并带上注释，在外面写距离有点远，也累。

Lucene2.9.2 + 盘古分词2.3.1（一） 入门： 建立简单索引，搜索（原创）

Lucene2.9.2 + 盘古分词2.3.1（一）入门：建立简单索引，搜索（原创）