Re: Using a char_filter in combination with a lowercase filter

2014-08-19 Thread Ivan Brusic
The plugin uses collation to identify characters which are equivalent. It does far more than simple replacement/folding, so sometimes the sort order matters. http://en.wikipedia.org/wiki/Collation http://userguide.icu-project.org/transforms/normalization Take a look at the plugin's test to figure

Re: Using a char_filter in combination with a lowercase filter

2014-08-19 Thread Matthias Hogerheijde
Thanks for your reply. I see that I didn't fully understand that CharFilters are ran first, which makes it logical to special-case the different cases. I was originally thrown off-scent that searching with an uppercase 'Y' worked and thought that the lowercase filter was not applied to the 'Y',

Re: Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Ivan Brusic
Char filters are applied before the text is tokenized, and therefore they are applied before the "normal" filters are used, which is why they are a separate class of filter. With Lucene, the order is: char filters -> tokenizer -> filters Have you looked into the ICU analyzer? http://www.elasticse

Using a char_filter in combination with a lowercase filter

2014-08-18 Thread Matthias Hogerheijde
Hi, We're using Elasticsearch with an Analyzer to map the `y` character to `ij`, (*char_fitler* named "char_mapper") since in Dutch these two are "somewhat" interchangeable. We're also using a *lowercase filter*. This is the configuration: { "analysis": { "analyzer": { "index": {