On Mon, Apr 2, 2012 at 5:46 PM, Simon Slavin <slav...@bigfraud.org> wrote:
> Replace part of that routine with something which specifies the locale rather 
> than fetching it from OS settings.  And store the locale used with the index, 
> as a COLLATE setting.  Thus leaving it up to whoever writes the CREATE 
> command to decide which locale was used.  I find that acceptable.  This does 
> still give you the problem Jean-Christophe noted of sorting multilanguage 
> lists of names, but that's inherent in Unicode.  Encountering the problem 
> just means you're implementing Unicode properly.

If only it were that easy.  A plain C locale (i.e., byte-wise)
collation will result in "encountering the problem", but you won't be
"implementing Unicode properly", you won't be implementing it at all!

Even if you use some Unicode collation, if you don't handle
normalization insensitivity then you're not really doing it right
either.

Consider that HFS+ on MacOS X always normalizes to NFD on
file/directory create.  But all user input methods I've seen to date
produce NFC for all Latin-* characters!  This means that if someone
does a cut-n-paste of filenames from an HFS+ filesystem then there
will be a very difficult-to-detect conflict.

Unicode is hard.  I want a doll that says that.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to