Ahmet is correct: the porter stemmer assumes that your input is lower case, so be sure to place the lower case filter before stemming.

BTW, this is the kind of detail that I have in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

You could also find this detail down at the level of the Lucene Javadoc, but IMHO it's inappropriate to expect Solr users to have to dive down into Lucene Javadoc.

See:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html

-- Jack Krupansky

-----Original Message----- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Wednesday, July 9, 2014 4:03 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Lower/UpperCase Issue

Do I need to use different algorithm instead of porter stemming..? can you suggest anything in you mind..?

-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue

Hi,

Analysis admin page will tell you the truth. Just a guess: porter stem filter could be "case sensitive" and that may cause the difference. I am pretty sure porter stemming algorithms designed to work on lowercase input.

By the way you have two lowercase filters defined in index analyzer.

Ahmet



On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)" <external.ravi.tamin...@us.bosch.com> wrote: I have a situation here, when I search with "BALANCER" the results are different Compare to "Balancer" and the order is different When I search "BALANCER" then, the documents with Upper Case are first in the List and for "Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing something.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
     <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
             <filter class="solr.PorterStemFilterFactory"/>
             <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
    <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>

        </analyzer>
   </fieldType>

e.g query

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABALANCER&wt=json&indent=true

http://localhost:8983/solr/Test/select?q=*%3A*&fq=Name%3ABalancer&wt=json&indent=true

Thanks

Ravi

Reply via email to