On 14 May 2011 06:33, Andrew Dunbar <hippytr...@gmail.com> wrote: > I'm almost positive Azeri has the same dotless i issue and perhaps > some of the other Turkic languages of Central Asia. One solution is to > do accent/diacritic normalization too as part of the canonicalization.
It's a good thing to think about these beforehand. But we already do enough mindless killing of diacritics. It doesn't work across all languages. In Finnish saa and sää are different words and ä is not a letter "a" with something added to it. -Niklas -- Niklas Laxström _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l