Erick, Thank you for your response!
The problem with this approach is that searching for "12:34" will also match "12.34" which is not what I want. ________________________________ From: Erick Erickson <erickerick...@gmail.com> To: solr-user@lucene.apache.org; Jian Xu <joseph...@yahoo.com> Sent: Thursday, April 12, 2012 8:01 AM Subject: Re: Question about solr.WordDelimiterFilterFactory WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 -> 1234 ab,cd -> ab cd is that "close enough"? Otherwise, writing a simple Filter is probably the way to go. Best Erick On Wed, Apr 11, 2012 at 1:59 PM, Jian Xu <joseph...@yahoo.com> wrote: > Hello, > > I am new to solr/lucene. I am tasked to index a large number of documents. > Some of these documents contain decimal points. I am looking for a way to > index these documents so that adjacent numeric characters (such as [0-9.,]) > are treated as single token. For example, > > 12.34 => "12.34" > 12,345 => "12,345" > > However, "," and "." should be treated as usual when around non-digital > characters. For example, > > ab,cd => "ab" "cd". > > It is so that searching for "12.34" will match "12.34" not "12 34". Searching > for "ab.cd" should match both "ab.cd" and "ab cd". > > After doing some research on solr, It seems that there is a build-in analyzer > called solr.WordDelimiterFilter that supports a "types" attribute which map > special characters as different delimiters. However, it isn't exactly what I > want. It doesn't provide context check such as "," or "." must surround by > digital characters, etc. > > Does anyone have any experience configuring solr to meet this requirements? > Is writing my own plugin necessary for this simple thing? > > Thanks in advance! > > -Jian