At a first glance, you're replacing the apostrophe with a space
so INT'L becomes INT L, two separate tokens. Why not
replace with ""? I.e. remove the apostrophe?

I also suspect you actually want WhitespaceTokensizerFactory,
KeywordTokenizerFactory will cause "my dog has fleas" to be
indexed exactly as one token consisting of 4 words. Unless this
is a very specialized field, it's usually the situation that you'd
like to index 4 tokens, but you know your problem space better
than I do. Admin/analysis is your friend.

You could also consider WordDelimiterFilterFactory with
catenateWords="1".

Best
Erick

On Fri, Jul 12, 2013 at 5:11 AM, kobe.free.wo...@gmail.com
<kobe.free.wo...@gmail.com> wrote:
> Hi,
>
> Scenario:
>
> User who perform search forget to put punctuation mark (apostrophe) for ex,
> when user wants to search for a value like INT'L, they just key in INTL
> (with no punctuation). In this scenario, I wish to return both values with
> INTL and INT'L that currently are indexed on SOLR instance. Currently, if I
> search for INTL it wont return the row having value INT'L.
>
> Schema Configuration entry for the field type:
>
>     <fieldType name="customStr" class="solr.TextField"
> positionIncrementGap="100" sortMissingLast="true">
>       <analyzer type="index">
>            <tokenizer class="solr.KeywordTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>            <filter class="solr.TrimFilterFactory" />
>            <filter class="solr.PatternReplaceFilterFactory"
> pattern="\s*[,.]\s*" replacement=" " replace="all" />
>            <filter class="solr.PatternReplaceFilterFactory" pattern="\s+"
> replacement=" " replace="all" />
>            <filter class="solr.PatternReplaceFilterFactory" pattern="[';]"
> replacement="" replace="all" />
>       </analyzer>
>       <analyzer type="query">
>            <tokenizer class="solr.KeywordTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>            <filter class="solr.PatternReplaceFilterFactory"
> pattern="\s*[,.]\s*" replacement=" " replace="all" />
>            <filter class="solr.PatternReplaceFilterFactory" pattern="\s+"
> replacement=" " replace="all" />
>            <filter class="solr.PatternReplaceFilterFactory" pattern="[';]"
> replacement="" replace="all"/>
>       </analyzer>
>     </fieldType>
>
> Please suggest as to what mechanism should I use to fetch both the values
> like INTL and INT'L, when the search is performed for INTL. Also, does the
> reg-ex look correct for the analyzers? What all different filters/ tokenizer
> can be used to overcome this issue.
>
> Thanks!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Search-with-punctuations-tp4077510.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to