The extra terms are okay at index time - they simply overlap the base words and make composite terms more searchable, but you need to have a separate query analyzer that sets the various catenate options to "0" since the query generator doesn't know what to do with the extra terms. Synonyms are a little more tricky - the simplest thing is to disable them in the index analyzer and do them only in the query analyzer - and multi-term synonyms don't work well, except for replacement synonyms at index time.

See the "text_en_splitting" field type in the example schema.

-- Jack Krupansky

-----Original Message----- From: Chung Wu
Sent: Monday, May 14, 2012 7:01 PM
To: solr-user@lucene.apache.org
Subject: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

Hi all!

I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either
using WordDelimiterFilterFactory with catenateWords="1", or with
SynonymFilterFactory with multi-word synonyms.

For example, in this type where a WordDelimiterFilterFactory is used for
the query analyzer, with catenateWords="1":

   <fieldType name="testType" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
     <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
     </analyzer>
   </fieldType>

For the query "wi-fi", the term positions after the
WordDelimiterFilterFactory looks like this:

position 1 2 term text wi fi wifi startOffset 0 3 0 endOffset 2 5 5
typewordwordword


And looking at debug output, the parsed query looks like this, which is
surprising:

<str name="rawquerystring">test1:"wi-fi"</str>
<str name="querystring">test1:"wi-fi"</str>
<str name="parsedquery">MultiPhraseQuery(test1:"wi (fi wifi)")</str>
<str name="parsedquery_toString">*test1:"wi (fi wifi)*"</str>

I see similar things happening if I use SynonymFilterFactory with
multi-word synonyms (maybe related to this bug:
https://issues.apache.org/jira/browse/SOLR-3390; I originally asked about
it here:
http://stackoverflow.com/questions/10218224/in-solr-expanding-multi-word-synonyms-and-term-positions
)

Any ideas on what I'm supposed to do to make this work as expected?

Thanks!

Chung

Reply via email to