Yes, the Word Delimiter filter does in fact break up a token into discrete words. In fact it seems antithetical that you are combining the keyword tokenizer that doesn't break up a string into words with the WDF that does.

Maybe you should drop back to standard tokenization coupled with the Edge n-gram token filter with a min and max of 3 so that It will index "INTERNATIONAL" as itself plus "INT".

And then maybe add a regex char filter to combine "INT'L" into "INTL".

-- Jack Krupansky

-----Original Message----- From: kobe.free.wo...@gmail.com
Sent: Wednesday, July 17, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Search with punctuations

Hi Erick,

I modified the SOLR schema file for the field as follows and re-indexed the
schema,

<fieldType name="CustomStr" class="solr.TextField"
positionIncrementGap="100" sortMissingLast="true">
     <analyzer type="index">
       <tokenizer class="solr.KeywordTokenizerFactory"/>
       <filter class="solr.WordDelimiterFilterFactory"
               generateWordParts="1"
               generateNumberParts="1"
               catenateWords="1"
               catenateNumbers="1"
               catenateAll="0"
               />
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.TrimFilterFactory" />

     </analyzer>
     <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
               generateWordParts="1"
               generateNumberParts="1"
               catenateWords="1"
               catenateNumbers="0"
               catenateAll="0"
               />
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.TrimFilterFactory" />

     </analyzer>
   </fieldType>

My previous scenario seems to be working fine i.e., when I search for
"INTL", I get both the records containing string like "INTL" and "INT'L".
But, I am not able to perform a STARTS WITH search i.e., my schema field has
values like "INTERNATIONAL XYZ LOCAL" and "PLAY OF INTERNATIONAL XYZ", when
I perform a STARTS WITH search for the keyword "INTERNATIONAL" it is
returning both the values but, ideally it should return only "INTERNATIONAL
XYZ LOCAL". To perform the STARTS WITH search I append the keyword with "*"
i.e., the keyword in my case becomes "INTERNATIONAL*".

It seems that the STARTS WITH search has started behaving like CONTAINS
search. Please suggest me how should I achieve this scenario of performing
the STARTS WITH search on the same field type.

Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/Search-with-punctuations-tp4077510p4078591.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to