Right, AsciiFoldingFilter seems to map Ü [LATIN CAPITAL LETTER U
WITH DIAERESIS] to "U" not "UE".
On Wed, Apr 17, 2019 at 12:26 AM Ralf Heyde wrote:
>
> Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc
>
> You could allow a distance of 1 or 2 given you use levenshtein distance
Thanks - GermanNormalizer seems as if it addresses this problem, yes.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc
You could allow a distance of 1 or 2 given you use levenshtein distance - this
might be close to what you need.
Von meinem iPhone gesendet
> Am 16.04.2019 um 20:08 schrieb Michael Sokolov :
>
> I'm learning how to index/search
; Sent: Tuesday 16th April 2019 20:28
> To: java-user@lucene.apache.org
> Subject: Re: umlauts / diacritic expansion
>
> Hey,
>
> Take a look at Asciifoldingfilter - this one is quite generic.
>
> Does this answer your question?
>
> Cheers Ralf
>
> Von meinem iPh
Hey,
Take a look at Asciifoldingfilter - this one is quite generic.
Does this answer your question?
Cheers Ralf
Von meinem iPhone gesendet
> Am 16.04.2019 um 20:08 schrieb Michael Sokolov :
>
> I'm learning how to index/search German today and understanding that
> vowels with umlauts are conv
I'm learning how to index/search German today and understanding that
vowels with umlauts are conventionally expanded into two ASCII
characters, eg "für" -> "fuer", so people may search for the expanded
form "fuer", but they might also search with the diacritic, and
finally they might lazily search