Re: AW: Best way to match umlauts

Jack Krupansky Mon, 17 Jun 2013 06:19:30 -0700

And this is a key advantage of using the mapping char filter rather than thesimple ASCII folding token filter - you can easily go in and modify themappings for application/domain/environment-specific character mappings suchas these.


-- Jack Krupansky

-----Original Message-----From: André Widhani

Sent: Monday, June 17, 2013 4:27 AM
To: solr-user@lucene.apache.org
Subject: AW: Best way to match umlauts

We configure both baseletter conversion (removing accents and umlauts) andalternate spelling through the mapping file.

For baseletter conversion and mostly german content we transform all accentsthat are not used in german language (like french é, è, ê etc.) to theirbaseletter. We do not do do this for german umlauts, because the assumptionis that a user will know the correct spelling in his or her native languagebut probably not in foreign languages.


For alternate spelling, we use the following mapping:

 # * Alternate spelling
 #

# Additionally, german umlauts are converted to their base form ("ä" =>"ae"),# and "ß" is converted to "ss". Which means both spellings can be used tofind

 # either one.
 #
 "\u00C4" => "AE"
 "\u00D6" => "OE"
 "\u00DC" => "UE"
 "\u00E4" => "ae"
 "\u00F6" => "oe"
 "\u00DF" => "ss"
 "\u00FC" => "ue"


André

=

Re: AW: Best way to match umlauts

Reply via email to