Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *

Erick Erickson Tue, 14 Jul 2015 10:58:18 -0700

Steve:

Simplest solution:
remove WordDelimiterFilterFactory.
Use something like PatternReplaceCharFilterFactory or
PatternReplaceFilterFactory to selectively remove the characters you
don't care about and leave in the ones you do care about.


You might also want to do this kind of thing in a copyField and search
one or the other selectively as desired, or perhaps boost or...

NOTE: one side effect of WDFF is that punctuation is removed, so you
have to consider what you want to do with periods at the end of a
sentence, apostrophes and the like.

Best,
Erick

On Tue, Jul 14, 2015 at 10:08 AM, Steven White <swhite4...@gmail.com> wrote:
> Thanks Jack.
>
> Can you provide me with a concrete example of how to:
>
> 1) Be able to search and find "$10" (without quotes).  This will get me
> started on how to add all other variations for !, @, etc. and be able to
> search on them.  In this case, a search for "$10" will give me a hit on
> text of "$10", but not "10" and a search on "10" will give me a hit on "10"
> but not "$10".
>
> 2) Prevent a hit on "10%" (without quotes).  This will get me started on
> howto prevent a hit on %, &, etc.  In this case, a search for "%" or "10%"
> will give me 0 hits, but a search on "10" will give me a hit on "10" or
> "10%".
>
> Do you see where I'm going with this?  Are both of those configurations
> possible?  This will let me customize Solr to meet customer need.
>
> Thanks.
>
> Steve
>
> On Mon, Jul 13, 2015 at 11:12 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> Oops... that's the "types" attribute.
>>
>> -- Jack Krupansky
>>
>> On Mon, Jul 13, 2015 at 11:11 PM, Jack Krupansky <jack.krupan...@gmail.com
>> >
>> wrote:
>>
>> > The word delimiter filter is remmoving special characters. You can add a
>> > file containing a list of the special characters that you wish to treat
>> as
>> > alpha, using the "type" parameter.
>> >
>> > -- Jack Krupansky
>> >
>> > On Mon, Jul 13, 2015 at 6:43 PM, Steven White <swhite4...@gmail.com>
>> > wrote:
>> >
>> >> Hi Everyone,
>> >>
>> >> I think the subject line said it all.  Here is the schema I'm using:
>> >>
>> >> <fieldType name="my_text" class="solr.TextField"
>> >> positionIncrementGap="100"
>> >> autoGeneratePhraseQueries="true">
>> >>   <analyzer>
>> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> >> words="lang/stopwords_en.txt"/>
>> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> >> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> >> catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1"
>> >> stemEnglishPossessive="1" preserveOriginal="1"/>
>> >> <filter class="solr.LowerCaseFilterFactory"/>
>> >> <filter class="solr.KeywordMarkerFilterFactory"
>> >> protected="protwords.txt"/>
>> >> <filter class="solr.PorterStemFilterFactory"/>
>> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >>   </analyzer>
>> >> </fieldType>
>> >>
>> >> I'm guessing this is due to how solr.WhitespaceTokenizerFactory works
>> and
>> >> those that it is not indexing are removed because they are considered
>> >> "white-spaces"?  If so, how can I include %, &, etc. into this
>> >> none-indexed
>> >> list?  I would rather see all these not indexed vs some are and some are
>> >> not causing confusion to my users.
>> >>
>> >> Thanks
>> >>
>> >> Steve
>> >>
>> >
>> >
>>

Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *

Reply via email to