Thanks Robert, Uwe - all this is enlightening. I didn't know about those
things you mentioned.
Dawid
On Sat, Nov 11, 2023 at 2:02 PM Uwe Schindler wrote:
> Hi Dawid,
>
> the ASCII folding filter is meant to remove accents. You would like to
> have searching for visually simila
Hi Dawid,
the ASCII folding filter is meant to remove accents. You would like to
have searching for visually similar characters. These are 2 different
things.
Actually Robert also has some config options, waht I generally use for
wester european searches where some documents may contain
nt on input).
> >
> > Dawid
> >
> > On Fri, Nov 10, 2023 at 6:58 PM Chris Hostetter
> > wrote:
> >>
> >>
> >> : Here's the unicode letter after "th":
> >> : https://www.fileformat.info/info/unicode/char/0435/index.htm
> >> :
>
the unicode letter after "th":
>> : https://www.fileformat.info/info/unicode/char/0435/index.htm
>> :
>> : To my surprise, I couldn't find it in the ascii folding filter:
>> :
>> :
>> https://github.com/apache/lucene/blob/main/lucene/analysis/common
:
>
> : Here's the unicode letter after "th":
> : https://www.fileformat.info/info/unicode/char/0435/index.htm
> :
> : To my surprise, I couldn't find it in the ascii folding filter:
> :
> :
> https://github.com/apache/lucene/blob/main/lucene/analysis/common/s
: Here's the unicode letter after "th":
: https://www.fileformat.info/info/unicode/char/0435/index.htm
:
: To my surprise, I couldn't find it in the ascii folding filter:
:
:
https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/mis
h means
>
> thе and the
>
> are two different things.
>
> Here's the unicode letter after "th":
> https://www.fileformat.info/info/unicode/char/0435/index.htm
>
> To my surprise, I couldn't find it in the ascii folding filter:
>
> https://github.com/
ormat.info/info/unicode/char/0435/index.htm
To my surprise, I couldn't find it in the ascii folding filter:
https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java
Anybody remembers whether the omission of Cyrillic