Hello,

Does anyone have a good solution for working with multi word synonyms? I've
been reading a lot about this online and haven't really found a great
solution to it. I use the SynonymFilterFactory at index time, but words
don't really get matched to the appropriate multi word synonyms, even
though using the Analysis tool shows that it should be matched.

Examples:

coke, coca cola



This is the configuration I have on text fields:

<fieldType name ="text_icu_english" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
        <analyzer type="index">
        <!-- The white space tokenizer splits on white space but preserves
the tokens so that it can be used by the next filter -->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" ignoreCase="true" expand=
"true" synonyms="synonyms.txt" />
        <!-- This filter splits a word on punctuation, preserves the
original, concatenates the split words and also stems english possessive
nouns -->
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts = "0"
          splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
pattern="(.*[\*].*)"  replacement=""/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.LengthFilterFactory" min="1" max="100"/>
        <filter class="solr.ClassicFilterFactory"/>

      </analyzer>
      <analyzer type="query">
        <!-- The white space tokenizer splits on white space but preserves
the tokens so that it can be used by the next filter -->
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <!-- This filter splits a word on punctuation, preserves the
original, concatenates the split words and also stems english possessive
nouns -->
         <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts = "0"
          splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <filter class="solr.ClassicFilterFactory"/>
      </analyzer>
      <similarity class="solr.BM25SimilarityFactory">
        <float name="b">0.0</float>
      </similarity>
    </fieldType>


Greatly appreciate any help ya'll can offer.

Thanks,
Sanjana

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.

Reply via email to