Re: case sensitivity

Michael Kimsal Thu, 26 Apr 2007 15:00:53 -0700

type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) ) )
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) ) )


Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always case-sensitive.
However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:


I was just writing a followup.

I'm using the default text field type

    <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
ignoreCase="true" expand="false"/>

        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="1" catenateNumbers="1" catenateAll="0"/>

        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="
solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="
solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0"/>

        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="
solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>


That looks to me like it's got LowerCaseFilterFactory in the query
analyzer and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
>
> On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
> > I've looked through the mailing lists and can't find much of anything
> > regarding case sensitivity.  It
> > seems SOLR is case sensitive by default - I'm using the default
> > settings
> > with a very basic schema - just text fields.
>
> All depends on the analysis you have set up for the fields.  If
> you're indexing "string"-type fields in the default example schema,
> there is effectively no analysis so searches must be exact matches
> case and all.
>
> > Is there any way to tell the query parser to be case insensitive
> > during a
> > query?  Or do I have to reindex
> > all my data again with lowercase values?
>
> Terms are indexed in a case-sensitive manner, so if you need case
> insensitivity you need to lowercase on the way in and on querying.
>
>         Erik
>
>
>


--
Michael Kimsal
http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com

Re: case sensitivity

Reply via email to