Re: case sensitivity

Michael Kimsal Thu, 26 Apr 2007 15:04:20 -0700

My colleague, after some digging, found in SolrQueryParser


(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?

Perhaps it's not related to my problem, but it seems that it might be.

Thanks in advance!

On 4/26/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:


type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) )
)
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) )
)

Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always
case-sensitive. However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
 to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:
>
> I was just writing a followup.
>
> I'm using the default text field type
>
>     <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>
>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
ignoreCase="true" expand="false"/>
>
>
>         -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
>         <filter class="
>
> solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="
>
> solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
>
> stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
>         <filter class="
>
> solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldtype>
>
>
> That looks to me like it's got LowerCaseFilterFactory in the query
> analyzer and the index analyzer.
>
> I'm still digging in to this, but are there any other things to look for
> anyone can point me to?  (Thanks Erik!)
>
>
>
>
> On 4/26/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
> > > I've looked through the mailing lists and can't find much of
> > anything
> > > regarding case sensitivity.  It
> > > seems SOLR is case sensitive by default - I'm using the default
> > > settings
> > > with a very basic schema - just text fields.
> >
> > All depends on the analysis you have set up for the fields.  If
> > you're indexing "string"-type fields in the default example schema,
> > there is effectively no analysis so searches must be exact matches
> > case and all.
> >
> > > Is there any way to tell the query parser to be case insensitive
> > > during a
> > > query?  Or do I have to reindex
> > > all my data again with lowercase values?
> >
> > Terms are indexed in a case-sensitive manner, so if you need case
> > insensitivity you need to lowercase on the way in and on querying.
> >
> >         Erik
> >
> >
> >
>
>
> --
> Michael Kimsal
> http://webdevradio.com
>



--
Michael Kimsal
http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

Reply via email to