-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Simon Slavin wrote: > Your descriptions make perfect sense and are very > interesting since ICU is a good attempt to get around one of the > fundamental problems of Unicode.
Errr, this is not the fault of Unicode. It is the fault of people! Unicode lets you represent the majority of the world's past and present characters using the same character set. Note that there is a lot of debate over exactly what constitutes a character, ways they combine code points, the same code point being used for different native character sets, dealing with older text where the character depiction matters even if it the "same" as a modern character. Unicode is a reasonable compromise. See http://en.wikipedia.org/wiki/Unicode#Issues Sorting and comparing strings are hard. For example someone in the US or UK would consider cafe and café to be equivalent. German has a different ordering for looking in a phonebook versus a dictionary. What do you do about a German user having a Swedish name in their phonebook? Is it sorted using Swedish rules or German rules? Unicode is not required to sort and compare strings, but it is a lot nicer place to start. And then the folks at the Unicode consortium who have been thinking about this for a very long time have come up with an algorithm that works (with locale specific adjustments) called the Unicode Collation Algorithm. Their report gives you a good idea of the complexity and issues involved. Section 1.8 is enlightening. http://www.unicode.org/unicode/reports/tr10/ ICU is a programming library implementing UCA plus a few other things. It is large and slow because of people, needing all sorts of builtin tables such as how each locale sorts things like accents and combining characters as well as ordinary codepoints commonly used across multiple locales: http://en.wikipedia.org/wiki/International_Components_for_Unicode You likely didn't intend your comment to be taken as condescending towards Unicode/UCA/ICU but I did want to make it *very* clear that they make life considerably easier for us as programmers dealing with human text and provide solutions to collation/case etc that we frequently need. It is far more than a "good attempt", closer to a very good solution. There aren't any alternatives that come *remotely* close as using the examples in the UCA report will show you. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkq0P0QACgkQmOOfHg372QSz9ACggmw5kaLKwL90nggbr0GaTxkZ SNMAn17gWLmy3SdbzZVMI6fSoUtTVmYS =jOGK -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users