It's a known 'issue' in dismax, (really an inherent part of dismax's design with no clear way to do anything about it), that qf over fields with different stop word definitions will produce odd results for a query with a stopword.

Here's my understanding of what's going on: http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

On 1/12/2011 6:48 PM, Markus Jelsma wrote:
Here's another thread on the subject:
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
td493483.html

And slightly off topic: you'd also might want to look at using common grams,
they are really useful for phrase queries that contain stopwords.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory


Here is what debug says each of these queries parse to:

1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results

1. +DisjunctionMaxQuery((Title:life))
2. +((DisjunctionMaxQuery((Title:life)))~1)
3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
4. +((DisjunctionMaxQuery((Contributor:the))
DisjunctionMaxQuery((Contributor:life | Title:life)))~2)

I see what's going on here.  Because "the" is a stop word for Title, it
gets removed from first part of the expression.  This means that
"Contributor" is required to contain "the".  dismax does the same thing
too.  I guess I should have run debug before asking the mail list!

It looks like the only workarounds I have is to either filter out the
stopwords in the client when this happens, or enable stop words for all
the fields that are used in "qf" with stopword-enabled fields.
Unless...someone has a better idea??

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, January 12, 2011 4:44 PM
To: solr-user@lucene.apache.org
Cc: Jayendra Patil
Subject: Re: StopFilterFactory and "qf" containing some fields that use it
and some that do not

Have used edismax and Stopword filters as well. But usually use the fq
parameter e.g. fq=title:the life and never had any issues.
That is because filter queries are not relevant for the mm parameter which
is being used for the main query.

Can you turn on the debugQuery and check whats the Query formed for all
the combinations you mentioned.

Regards,
Jayendra

On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
<james.d...@ingrambook.com>wrote:
I'm running into a problem with StopFilterFactory in conjunction with
(e)dismax queries that have a mix of fields, only some of which use
StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
does not use StopFilterFactory, then stop words are not removed when
searching any fields.  Here's an example of what I mean:

- I have 2 fields indexed:
  >  Title is "textStemmed", which includes StopFilterFactory (see
  >  below). Contributor is "textSimple", which does not include
  >  StopFilterFactory

(see below).
- "The" is a stop word in stopwords.txt
- q=life&defType=edismax&qf=Title  ... returns 277,635 results
- q=the life&defType=edismax&qf=Title ... returns 277,635 results
- q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
results

It seems as if the stop words are not being stripped from the query
because "qf" contains a field that doesn't use StopFilterFactory.  I
did testing with combining Stemmed fields with not Stemmed fields in
"qf" and it seems as if stemming gets applied regardless.  But stop
words do not.

Does anyone have ideas on what is going on?  Is this a feature or
possibly a bug?  Any known workarounds?  Any advice is appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
________________________________
<fieldType name="textSimple" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="textStemmed" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

Reply via email to