On Mon, Apr 2, 2012 at 1:25 PM, Richard Hipp <d...@sqlite.org> wrote: > On Mon, Apr 2, 2012 at 2:03 PM, Simon Slavin <slav...@bigfraud.org> wrote: >> I think ... a higher priority than that would be handling Unicode >> correctly. And having Unicode support would be useful in writing the code >> which handles dates. >> > size of SQLite library: approx 500 KB > size of ICU library: approx 21,919 KB > > The ICU library (needed to handle Unicode "correctly") is over 40x larger > than SQLite. Can you understand then why we don't want to make SQLite > dependent upon ICU? > > If you really need correct ICU support, SQLite will optionally link with > ICU and use it. But *requiring* SQLite to link against ICU is a > deal-breaker for many users.
Also, Unicode collation is typically orders of magnitude slower than US-ASCII collation. This comes up a lot in other contexts, particularly as the various OSes have begun defaulting to Unicode locales. I've seen ls(1) of directories with millions of files run as fast as the output device permits when run in the C locale (in less than 1 second when tmpfs), but take many minutes when in a UTF-8 locale, and that's without any use of normalization. But mostly this is a result of Unicode collation in libc being awful. The OpenSolaris u8_textprep code is designed to make u8_str*cmp() really fast, though not quite as fast as the C locale strcmp(), when strings are mostly ASCII and even when they are not because u8_textprep does no memory allocation for normalization-insensitive comparison and has a fast-path for comparing substrings of two or more ASCII codepoints. This is the main reason that I'd recommend u8_textprep... Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users