Re: Which Tokeniser (and/or filter)

Ahmet Arslan Mon, 06 Feb 2012 03:33:40 -0800

> I need to tokenise on whitespace, full-stop, and comma
> ONLY.
> 
> Currently using solr.WhitespaceTokenizerFactory with
> WordDelimiterFilterFactory but this is also splitting on
> &, /, new-line, etc.


WDF is customizable via types="wdftypes.txt" parameter. 

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/wdftypes.txt

Alternatively you can convert . and , to whitespace (before tokenizer) by 
MappingCharFilterFactory. 

http://lucene.apache.org/solr/api/org/apache/solr/analysis/MappingCharFilterFactory.html

Re: Which Tokeniser (and/or filter)

Reply via email to