Antw: Re: Correct order of mappinCharFilter, Tokenizer and GermanStemFilter

Doris Peter Fri, 19 Jul 2019 02:13:06 -0700

Thanks for the answer. I examined the  ICUFoldingFilterFactory, but it seems to 
me, that it can't be customized the way I would need it.
We have got some special foldings, e.g.: ä->ae. In the CharFilter, I can add it 
to the following file: "mapping="mapping-FoldToASCII.txt"
There seems to be nothing like this mapping file in the ICUFoldingFilter? 
Exclusion is not enough ....

>>> Shawn Heisey <apa...@elyograg.org> 7/18/2019 3:08 PM >>> 
On 7/18/2019 3:01 AM, Doris Peter wrote:
> So, the mappingCharFilter seems to be executed at first, no matter which 
> position it has in the configuration?

CharFilters are always executed first.  Then one Tokenizer, then 
Filters.  This will always be the case, even if you order the config so 
that the Tokenizer and one or more Filters are listed before CharFilter 
entries.  It's one of the quirks of analysis definitions.

The fix for this would be to see if there is a regular Filter that does 
what the CharFilter you're using does and use that filter instead.

If it were me, I would likely use ICUFoldingFilterFactory rather than 
MappingCharFilterFactory.  The ICU analysis components do require 
installing contrib jars into Solr.

https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#icu-folding-filter

Thanks,
Shawn

Antw: Re: Correct order of mappinCharFilter, Tokenizer and GermanStemFilter

Reply via email to