Ahmet is correct: the porter stemmer assumes that your input is lower case, so be sure to place the lower case filter before stemming.

BTW, this is the kind of detail that I have in my e-book:

You could also find this detail down at the level of the Lucene Javadoc, but IMHO it's inappropriate to expect Solr users to have to dive down into Lucene Javadoc.


-- Jack Krupansky

-----Original Message----- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Wednesday, July 9, 2014 4:03 PM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Lower/UpperCase Issue

Do I need to use different algorithm instead of porter stemming..? can you suggest anything in you mind..?

-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Wednesday, July 09, 2014 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Lower/UpperCase Issue


Analysis admin page will tell you the truth. Just a guess: porter stem filter could be "case sensitive" and that may cause the difference. I am pretty sure porter stemming algorithms designed to work on lowercase input.

By the way you have two lowercase filters defined in index analyzer.


On Wednesday, July 9, 2014 7:18 PM, "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)" <external.ravi.tamin...@us.bosch.com> wrote: I have a situation here, when I search with "BALANCER" the results are different Compare to "Balancer" and the order is different When I search "BALANCER" then, the documents with Upper Case are first in the List and for "Balancer" it is in different order.

I am confused why this behavior, Can some one has same issue or I am missing something.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
     <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
             <filter class="solr.PorterStemFilterFactory"/>
             <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
    <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>


e.g query





Reply via email to