The word delimiter filter is actually combining "100-001" into "100001". You
have BOTH catenateNumbers AND catenateAll, so "100-R8989" should generate
THREE tokens: the concatenated numbers 100", the concatenated words "R8989",
and both numbers and words concatenated, "100R8989 ".
-- Jack Krupansky
-----Original Message-----
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Friday, August 8, 2014 3:27 PM
To: solr-user@lucene.apache.org
Subject: WordDelimiter
HI, I have a situation where I don't want to split the words, I am using the
workdelimterfilter where it works good.
For eg. If I send to analyszer for 100-001 , it is not splitting the
keyword, but if I send 100-R8989 then the worddelimiter filter to 100 |
R9889, below is the filed analyzer and filter. Same thing using for Query
time.
Let me know if I am missing something here.
<analyzer type="index">
<charFilter
class="solr.HTMLStripCharFilterFactory" />
<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" />
<filter
class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KStemFilterFactory"/>
<filter
class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="0" splitOnCaseChange="0"
splitOnNumerics="0"
stemEnglishPossessive="0" catenateWords="1" catenateNumbers="1"
catenateAll="1"
preserveOriginal="0"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
</analyzer>