At 23:44 02/04/2012, you wrote:
I wonder whether it respects languages.
These don't, but u8_strcoll et al supposedly do, based on LC_COLLATE
locale category. Herein lies the problem: if you build an index using
these functions while running under locale A, then try to run queries
against this database in an application running with locale B, bad
things happen. From the point of view of the second application, the
index is corrupted.
That is: the collation used for this exact purpose becomes a strong
metadata of the table but there's no easy way to deal with that.
Not only, but say you have a table of worldwide customers (I do have
clients in 39 countries today), which exact locale are you going to
use? I know this question has no answer (and that's the main grief I
have with Unicode).
A workable perspective is to come up with a "decent if not perfect" way
to deal with unaccenting and getting rid of the locale concept.
For those ready to cut some corners and to give a rough idea of what
can be done easily if you're ready to live with some compromises (since
*-no-* perfect solution exists), my C shared library implementing a
large number of string and misc. functions (with both UTF-8 & UTF-16
interfaces) dealing with a weak form of "unaccented Unicode v5.1" is
currently a 143Kb Win x86 DLL and runs reasonably fast.
JcD
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users