Re: Lower/UpperCase Issue

Jack Krupansky Wed, 09 Jul 2014 13:16:31 -0700

Ahmet is correct: the porter stemmer assumes that your input is lower case,so be sure to place the lower case filter before stemming.


BTW, this is the kind of detail that I have in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

You could also find this detail down at the level of the Lucene Javadoc, butIMHO it's inappropriate to expect Solr users to have to dive down intoLucene Javadoc.


See:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html

-- Jack Krupansky

-----Original Message-----From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Wednesday, July 9, 2014 4:03 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Lower/UpperCase Issue

Do I need to use different algorithm instead of porter stemming..? can yousuggest anything in you mind..?


-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

Hi,

Analysis admin page will tell you the truth. Just a guess: porter stemfilter could be "case sensitive" and that may cause the difference. I ampretty sure porter stemming algorithms designed to work on lowercase input.


By the way you have two lowercase filters defined in index analyzer.

Ahmet

On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI,Automotive-Service-Solutions)" <external.ravi.tamin...@us.bosch.com> wrote:I have a situation here, when I search with "BALANCER" the results aredifferent Compare to "Balancer" and the order is different When I search"BALANCER" then, the documents with Upper Case are first in the List and for"Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missingsomething.

<fieldType name="text_general" class="solr.TextField"positionIncrementGap="100">

     <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
     <tokenizer class="solr.StandardTokenizerFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" />

             <filter class="solr.PorterStemFilterFactory"/>
             <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.WordDelimiterFilterFactory"generateWordParts="0" generateNumberParts="0" catenateWords="1"catenateNumbers="1" catenateAll="0"/>

       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
    <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.PorterStemFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" /><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"ignoreCase="true" expand="true"/>

       <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.WordDelimiterFilterFactory"generateWordParts="0" generateNumberParts="0" catenateWords="1"catenateNumbers="1" catenateAll="0"/>


        </analyzer>
   </fieldType>

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi

Re: Lower/UpperCase Issue

Reply via email to