I appreciate the reply and blog posting.  For now, I just enabled stopwords for 
all the fields on "Qf".  We have a very short list anyhow and our legacy search 
engine didn't even allow field-by-field configuration (stopwords are global on 
that system).

I do wonder...what if (e)dismax had a flag you could set that would tell it 
that if any analyzers removed a term, then that term would become optional for 
any fields for which it remained?  I'm not sure what the development effort 
would perhaps it would be a nice way to circumvent this problem in a future 
release...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Thursday, January 13, 2011 9:54 AM
To: solr-user@lucene.apache.org; markus.jel...@openindex.io
Cc: Dyer, James
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and 
some that do not

It's a known 'issue' in dismax, (really an inherent part of dismax's 
design with no clear way to do anything about it), that qf over fields 
with different stop word definitions will produce odd results for a 
query with a stopword.

Here's my understanding of what's going on: 
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

On 1/12/2011 6:48 PM, Markus Jelsma wrote:
> Here's another thread on the subject:
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
> td493483.html
>
> And slightly off topic: you'd also might want to look at using common grams,
> they are really useful for phrase queries that contain stopwords.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>
>
>> Here is what debug says each of these queries parse to:
>>
>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>
>> 1. +DisjunctionMaxQuery((Title:life))
>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>> 4. +((DisjunctionMaxQuery((Contributor:the))
>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>
>> I see what's going on here.  Because "the" is a stop word for Title, it
>> gets removed from first part of the expression.  This means that
>> "Contributor" is required to contain "the".  dismax does the same thing
>> too.  I guess I should have run debug before asking the mail list!
>>
>> It looks like the only workarounds I have is to either filter out the
>> stopwords in the client when this happens, or enable stop words for all
>> the fields that are used in "qf" with stopword-enabled fields.
>> Unless...someone has a better idea??
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>> -----Original Message-----
>> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
>> Sent: Wednesday, January 12, 2011 4:44 PM
>> To: solr-user@lucene.apache.org
>> Cc: Jayendra Patil
>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>> and some that do not
>>
>>> Have used edismax and Stopword filters as well. But usually use the fq
>>> parameter e.g. fq=title:the life and never had any issues.
>> That is because filter queries are not relevant for the mm parameter which
>> is being used for the main query.
>>
>>> Can you turn on the debugQuery and check whats the Query formed for all
>>> the combinations you mentioned.
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>> <james.d...@ingrambook.com>wrote:
>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
>>>> does not use StopFilterFactory, then stop words are not removed when
>>>> searching any fields.  Here's an example of what I mean:
>>>>
>>>> - I have 2 fields indexed:
>>>>   >  Title is "textStemmed", which includes StopFilterFactory (see
>>>>   >  below). Contributor is "textSimple", which does not include
>>>>   >  StopFilterFactory
>>>>
>>>> (see below).
>>>> - "The" is a stop word in stopwords.txt
>>>> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>>> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
>>>> results
>>>>
>>>> It seems as if the stop words are not being stripped from the query
>>>> because "qf" contains a field that doesn't use StopFilterFactory.  I
>>>> did testing with combining Stemmed fields with not Stemmed fields in
>>>> "qf" and it seems as if stemming gets applied regardless.  But stop
>>>> words do not.
>>>>
>>>> Does anyone have ideas on what is going on?  Is this a feature or
>>>> possibly a bug?  Any known workarounds?  Any advice is appreciated.
>>>>
>>>> James Dyer
>>>> E-Commerce Systems
>>>> Ingram Content Group
>>>> (615) 213-4311
>>>> ________________________________
>>>> <fieldType name="textSimple" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> <fieldType name="textStemmed" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>>> ignoreCase="true" expand="true"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>

Reply via email to