> I need to tokenise on whitespace, full-stop, and comma > ONLY. > > Currently using solr.WhitespaceTokenizerFactory with > WordDelimiterFilterFactory but this is also splitting on > &, /, new-line, etc.
WDF is customizable via types="wdftypes.txt" parameter. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/wdftypes.txt Alternatively you can convert . and , to whitespace (before tokenizer) by MappingCharFilterFactory. http://lucene.apache.org/solr/api/org/apache/solr/analysis/MappingCharFilterFactory.html