Looking at the javadoc for the WordDelimiterFilterFactory, it suggests this config:
<fieldType name="text_wd" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" protected="protectedword.txt" preserveOriginal="0" splitOnNumerics="1" splitOnCaseChange="1" catenateWords="0" catenateNumbers="0" catenateAll="0" generateWordParts="1" generateNumberParts="1" stemEnglishPossessive="1" types="wdfftypes.txt" /> </analyzer> </fieldType> Note the protected="xxxxx" attribute. I suspect if you put Yahoo! into a file referenced by that attribute, it may survive analysis. I'd be curious to hear whether it works. Upayavira On Tue, Jul 21, 2015, at 12:51 AM, Sathiya N Sundararajan wrote: > Question about WordDelimiterFilter. The search behavior that we > experience > with WordDelimiterFilter satisfies well, except for the case where there > is > a special character either at the leading or trailing end of the term. > > For instance: > > *‘d&b’ * —> Works as expected. Finds all docs with ‘d&b’. > *‘p!nk’* —> Works fine as above. > > But on cases when, there is a special character towards the trailing end > of > the term, like ‘Yahoo!’ > > *‘yahoo!’* —> Turns out to be a search for just *‘yahoo’* with the > special > character *‘!’* stripped out. This WordDelimiterFilter behavior is > documented > http://lucene.apache.org/core/4_6_0/analyzers-common/index.html?org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html > > What I would like to have is, the search performed without stripping out > the leading & trailing special character. Is there a way to achieve this > behavior with WordDelimiterFilter. > > This is current config that we have for the field: > > <fieldType name="text_wdf" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > catenateWords="0" catenateNumbers="0" catenateAll="0" > preserveOriginal="1" > types="specialchartypes.txt"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory" /> > <filter class="solr.WordDelimiterFilterFactory" > splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" > catenateWords="0" catenateNumbers="0" catenateAll="0" > preserveOriginal="1" > types="specialchartypes.txt"/> > <filter class="solr.LowerCaseFilterFactory" /> > </analyzer> > </fieldType> > > > thanks