Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Erick Erickson
WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 - 1234 ab,cd - ab cd is that close enough? Otherwise, writing a simple Filter is probably the way to go. Best Erick On

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Jian Xu
: Thursday, April 12, 2012 8:01 AM Subject: Re: Question about solr.WordDelimiterFilterFactory WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 - 1234 ab,cd - ab cd

Question about solr.WordDelimiterFilterFactory

2012-04-11 Thread Jian Xu
Hello, I am new to solr/lucene. I am tasked to index a large number of documents. Some of these documents contain decimal points. I am looking for a way to index these documents so that adjacent numeric characters (such as [0-9.,]) are treated as single token. For example, 12.34 = 12.34