The plugin uses collation to identify characters which are equivalent. It
does far more than simple replacement/folding, so sometimes the sort order
matters.
http://en.wikipedia.org/wiki/Collation
http://userguide.icu-project.org/transforms/normalization
Take a look at the plugin's test to figure
Thanks for your reply. I see that I didn't fully understand that
CharFilters are ran first, which makes it logical to special-case the
different cases. I was originally thrown off-scent that searching with an
uppercase 'Y' worked and thought that the lowercase filter was not applied
to the 'Y',
Char filters are applied before the text is tokenized, and therefore they
are applied before the "normal" filters are used, which is why they are a
separate class of filter. With Lucene, the order is:
char filters -> tokenizer -> filters
Have you looked into the ICU analyzer?
http://www.elasticse
Hi,
We're using Elasticsearch with an Analyzer to map the `y` character to
`ij`, (*char_fitler* named "char_mapper") since in Dutch these two are
"somewhat" interchangeable. We're also using a *lowercase filter*.
This is the configuration:
{
"analysis": {
"analyzer": {
"index": {