Re: wild card search and lower-casing

Erick Erickson Sun, 20 Nov 2011 19:07:21 -0800

As it happens I'm working on SOLR-2438 which should address this. This patch
will provide two things:

The ability to define a new analysis chain in your schema.xml, currently called
"multiterm" that will be applied to queries of various sorts,
including wildcard,
prefix, range. This will be somewhat of an "expert" thing to make yourself...

In the absence of an explicit definition it'll synthesize a multiterm analyzer
out of the query analyzer, taking any char fitlers, and
lowercaseFilter (if present),
and ASCIIFoldingfilter (if present) and putting them in the multiterm
analyzer along
with a (hardcoded) WhitespaceTokenizer.

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick

On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>
>> You're right:
>>
>> public SolrQueryParser(IndexSchema schema, String
>> defaultField) {
>> ...
>> setLowercaseExpandedTerms(false);
>> ...
>> }
>
> Please note that lowercaseExpandedTerms uses String.toLowercase() (uses  
> default Locale) which is a Locale sensitive operation.
>
> In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if 
> it is ported to solr.
>
>  http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
>

Re: wild card search and lower-casing

Reply via email to