Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain

Erick Erickson Wed, 01 Jul 2020 12:31:07 -0700

Consider something other than WhitespaceTokenizer. In this case
the tokenizer would split on the period and it’d work. I don’t know
whether that would fit the rest of your problem space or not though.


But to answer your original question, no there’s no a-priori reason you
can’t have WordDelimiter(Graph)FilterFactory twice, but I suspect
better tokenization is a more robust answer.

Best,
Erick

> On Jul 1, 2020, at 3:11 PM, gnandre <arnoldbron...@gmail.com> wrote:
> 
> Here are links to images for the Analysis tab.
> 
> https://pasteboard.co/JfFTYu6.png
> https://pasteboard.co/JfFUYXf.png
> 
> 
> On Wed, Jul 1, 2020 at 3:03 PM gnandre <arnoldbron...@gmail.com> wrote:
> I am doing that already but it does not help.
> 
> Here is the complete analyzer chain.
> 
>   <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
> 
>       
> <analyzer type="index">
> 
>         
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 
>         
> <filter class="solr.WordDelimiterFilterFactory" protected="protect.txt" 
> preserveOriginal="1"  generateWordParts="1" generateNumberParts="1" 
> catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>         
> <filter class="solr.LowerCaseFilterFactory"/>
> 
>         
> <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/>
> 
>         
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" 
> ignoreCase="true" expand="true"/>
> 
>         
> <filter class="solr.KStemFilterFactory"/>
> 
>         
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
>       
> </analyzer>
> 
>       
> <analyzer type="query">
> 
>         
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 
>         
> <filter class="solr.WordDelimiterFilterFactory" protected="protect.txt" 
> preserveOriginal="1"  generateWordParts="1" generateNumberParts="1" 
> catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>         
> <filter class="solr.LowerCaseFilterFactory"/>
> 
>         
> <filter class="solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/>
>         <filter class="solr.SynonymFilterFactory" 
> synonyms="synonyms_en_query.txt" ignoreCase="true" expand="true"/>
> 
>         
> <filter class="solr.KStemFilterFactory"/>
> 
>         
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> 
>       
> </analyzer>
> 
>   
> </fieldType>  
> 
> 
> 
> 
> 
> On Wed, Jul 1, 2020 at 12:29 PM Erick Erickson <erickerick...@gmail.com> 
> wrote:
> Why not just specify preserveOriginal and follow by a lowerCaseFilter and
> use one wordDelimiterFilterFactory?
> 
> Best,
> Erick
> 
> > On Jul 1, 2020, at 11:05 AM, gnandre <arnoldbron...@gmail.com> wrote:
> > 
> > Hi,
> > 
> > To satisfy one use-case, I need to apply WordDelimiterFilter with
> > splitOnCaseChange
> > with 0 once and then with 1 again. Are there some downsides to this
> > approach?
> > 
> > Use-case is to be able to match results when indexed content is my.camelCase
> > and search query is camelcase.
>

Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain

Reply via email to