Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

Erick Erickson Thu, 11 Mar 2010 12:38:36 -0800

Kind of a shot in the dark here, but your parameters for index and query on
WordDelimiterFilterFactory are different, especially suspicious is
catenateWords.


You could test this by looking in your index with the SOLR admin page and/or
Luke to see what your actual terms are.....

And don't forget you'll have to re-index after restarting SOLR for any
index
changes to take effect....

HTH
Erick

On Thu, Mar 11, 2010 at 2:20 PM, Ya-Wen Hsu <y...@eline.com> wrote:

> Yonik, thank you for your reply. When I don't use PreserveOriginal = 1 for
> WordDelimiterFilterFactory, the query "ain't" is parsed as "ain t" and no
> match is found in this case too. If I remove ' from the query, then I can
> get results. I used the analysis tool and see the term ain't is processed as
> "ain t", and get matches when the title includes "ain't". But I got no
> result when using ain't query with dismax.
>
> The debug output looks like:
> (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s)
> +(long_description:"ain t"^2.0 | name:"ain t"^3.0 | search_keywords:"ain
> t")~0.1 (long_description:save^2.0 | name:save^3.0 |
> search_keywords:saved)~0.1) ()
>
>
> Below is my configuration for text field type.
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <!--<filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>
> I get results back when I tried to use solr.LowerCaseTokenizerFactory
> instead of solr.WhitespaceTokenizerFactory. However, the concern here is
> this might reduce the quality of relevant search. Does anyone have a better
> idea on what to try next? Thanks!
>
> Wen
> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Thursday, March 11, 2010 10:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: dismax and WordDelimiterFilterFactory with PreserveOriginal =
> 1
>
> On Thu, Mar 11, 2010 at 1:07 PM, Ya-Wen Hsu <y...@eline.com> wrote:
> > Hi all,
> >
> > I'm facing the same issue as previous post here:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html.
> Since no one answers this post, I thought I'll ask again. In my case, I use
> below setting for index
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0" preserveOriginal="1"/>
> > and
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" preserveOriginal="1"/> for query.
> >
> > When I use query with word "ain't", no result is returned. When I turned
> on the logging, I found the word is interpreted as "(ain't ain) t".
>
>
> The problem is preserving the original in the query analyzer - try
> removing that.  And if you aren't doing prefix or wildcard queries,
> preserveOriginal doesn't buy you anything but wasted index space.
>
> It's the same issue of why you can't generate and catenate at the same
> time with the query parser.
>
> -Yonik
> http://www.lucidimagination.com
>

Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

Reply via email to