Accent transforms are language-specific, so an accent filter should take an ISO langauge code as an argument.
Some examples: * In French and English, a diereses is a hint to pronounce neighboring vowels separateley, as in coöp, naïve, or Noël. * In German, ü transformes to ue. * In Swedish, ö is a different letter than o, and should not be transformed. The same is true for ø in Danish and Norwegian. * Then there is Motörhead and Motley Crüe, see: http://en.wikipedia.org/wiki/Heavy_metal_umlaut * I don't know of an ISO language code for Tolkein's Elvish, so we're out of luck for Manwë. Another approach would be to generate the accent-transformed terms as synonyms at the same token position. Then you could generate multiple options. Obviously, we had to do this right for Ultraseek a few years ago. wunder On 9/27/07 9:13 AM, "Steven Rowe" <[EMAIL PROTECTED]> wrote: > Maybe there should be an option on ISOLatin1TokenFilter to use German > substitutions, in addition to the current behavior of simply stripping > diacritics? > > Does anyone know if there are other (Latin-1-utilizing) languages > besides German with standardized diacritic substitutions that involve > something other than just stripping the diacritics?