
Thank you for your response! 

The problem with this approach is that searching for "12:34" will also match 
"12.34" which is not what I want.

 From: Erick Erickson <>
To:; Jian Xu <> 
Sent: Thursday, April 12, 2012 8:01 AM
Subject: Re: Question about solr.WordDelimiterFilterFactory
WordDelimiterFilterFactory will _almost_ do what you want
by setting things like catenateWords=0 and catenateNumbers=1,
_except_ that the punctuation will be removed. So
12.34 -> 1234
ab,cd -> ab cd

is that "close enough"?

Otherwise, writing a simple Filter is probably the way to go.


On Wed, Apr 11, 2012 at 1:59 PM, Jian Xu <> wrote:
> Hello,
> I am new to solr/lucene. I am tasked to index a large number of documents. 
> Some of these documents contain decimal points. I am looking for a way to 
> index these documents so that adjacent numeric characters (such as [0-9.,]) 
> are treated as single token. For example,
> 12.34 => "12.34"
> 12,345 => "12,345"
> However, "," and "." should be treated as usual when around non-digital 
> characters. For example,
> ab,cd => "ab" "cd".
> It is so that searching for "12.34" will match "12.34" not "12 34". Searching 
> for "" should match both "" and "ab cd".
> After doing some research on solr, It seems that there is a build-in analyzer 
> called solr.WordDelimiterFilter that supports a "types" attribute which map 
> special characters as different delimiters.  However, it isn't exactly what I 
> want. It doesn't provide context check such as "," or "." must surround by 
> digital characters, etc.
> Does anyone have any experience configuring solr to meet this requirements?  
> Is writing my own plugin necessary for this simple thing?
> Thanks in advance!
> -Jian

Reply via email to