Hallvard B Furuseth wrote: > I need a function which converts Latin Unicode characters to > the closest equivalent ASCII characters, e.g. "é" -> "e". > > Before I reinvent the wheel, does any public domain or GPL > code for this already exist?
I don't know, sorry. > If not, > for the most part I expect I can make the mapping from the character > names, e.g. ignore 'WITH ACUTE' in 'LATIN CAPITAL LETTER O WITH ACUTE' > in <ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>. Why the name!? The decomposition property (5th filed on each line) is much better for this. E.g.: 00E9;LATIN SMALL LETTER E WITH ACUTE;Ll;0;L;0065 0301;;;;N;LATIN SMALL LETTER E ACUTE;;00C9;;00C9 The decomposition field tells you that "é" (code 00E9 hex) is composed of ASCII "e" (code 0065 hex) and the combining acute accent (code 0301 hex): you keep the ASCII character and drop the composing accent. > Punctuation and other non-letters will be worse, but they are less > important to me anyway. The result is much better if you allow the ASCII conversion to be a string. This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also good for letters: "ß" = "ss", "å" = "aa", etc. _ Marco