[ solr入门 ] 在schema.xml中加入中文分词(IKAnalyzer)

http://www.cnblogs.com/huangfox/archive/2012/02/08/2342881.html

一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。

1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:

2.在schema.xml配置IKAnalyzer

<!-- IKAnalyzer3.2.8 中文分词-->
	<fieldType name="text" class="solr.TextField">
		<analyzer type="index">
			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory"  isMaxWordLength="false"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
			<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="true"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
		</analyzer>   
    </fieldType>

3.启动solr进行验证

在field中选择type,并输入test,在field value中输入一段中文,Analyze既可以看到分词效果。

verbose output 选项可以查看分词详细信息。

具体的schema.xml的配置可以查看solr wiki:

http://wiki.apache.org/solr/SchemaXml

Data Types

The <types> section allows you to define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.

Any subclass of FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package. For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.

Common options that field types can have are...
sortMissingLast=true|false
sortMissingFirst=true|false
indexed=true|false
stored=true|false
multiValued=true|false
omitNorms=true|false
omitTermFreqAndPositions=true|false  Solr1.4
omitPositions|false  Solr3.4
positionIncrementGap=N
TextFields can also support Analyzers with highly configurable Tokenizers and Token Filters.

Field types that store text (TextField, StrField) support compression of stored contents:

compressed=true|false
compressThreshold=<integer>
compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true; a common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.

  

原文地址:https://www.cnblogs.com/huangfox/p/2342915.html