Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Markus Jelsma Wed, 12 Jan 2011 14:26:19 -0800

I haven't used edismax but i can imagine its a feature. Ths is because 
inconstent use of stopwords in the analyzers of the fields specified in qf can 
yield really unexpected results because of the mm parameter.


In dismax, if one analyzer removed stopwords and the other doesn't the mm 
parameter goes crazy.

> I'm running into a problem with StopFilterFactory in conjunction with
> (e)dismax queries that have a mix of fields, only some of which use
> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> does not use StopFilterFactory, then stop words are not removed when
> searching any fields.  Here's an example of what I mean:
> 
> - I have 2 fields indexed:
>   > Title is "textStemmed", which includes StopFilterFactory (see below).
>   > Contributor is "textSimple", which does not include StopFilterFactory
>   > (see below).
> 
> - "The" is a stop word in stopwords.txt
> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
> 
> It seems as if the stop words are not being stripped from the query because
> "qf" contains a field that doesn't use StopFilterFactory.  I did testing
> with combining Stemmed fields with not Stemmed fields in "qf" and it seems
> as if stemming gets applied regardless.  But stop words do not.
> 
> Does anyone have ideas on what is going on?  Is this a feature or possibly
> a bug?  Any known workarounds?  Any advice is appreciated.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> ________________________________
> <fieldType name="textSimple" class="solr.TextField"
> positionIncrementGap="100"> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> <fieldType name="textStemmed" class="solr.TextField"
> positionIncrementGap="100"> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" /> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" /> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Reply via email to