WordDelimiterFilter Leading & Trailing Special Character

Sathiya N Sundararajan Mon, 20 Jul 2015 16:52:50 -0700

Question about WordDelimiterFilter. The search behavior that we experience
with WordDelimiterFilter satisfies well, except for the case where there is
a special character either at the leading or trailing end of the term.


For instance:

*‘d&b’ *  —>  Works as expected. Finds all docs with ‘d&b’.
*‘p!nk’*  —>  Works fine as above.

But on cases when, there is a special character towards the trailing end of
the term, like ‘Yahoo!’

*‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the special
character *‘!’* stripped out.  This WordDelimiterFilter behavior is
documented
http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html

What I would like to have is, the search performed without stripping out
the leading & trailing special character. Is there a way to achieve this
behavior with WordDelimiterFilter.

This is current config that we have for the field:

<fieldType name="text_wdf" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0"
catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"
types="specialchartypes.txt"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0"
catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"
types="specialchartypes.txt"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>


thanks

WordDelimiterFilter Leading & Trailing Special Character

Reply via email to