At 23:44 02/04/2012, you wrote:

I wonder whether it respects languages.

These don't, but u8_strcoll et al supposedly do, based on LC_COLLATE locale category. Herein lies the problem: if you build an index using these functions while running under locale A, then try to run queries against this database in an application running with locale B, bad things happen. From the point of view of the second application, the index is corrupted.

That is: the collation used for this exact purpose becomes a strong metadata of the table but there's no easy way to deal with that.

Not only, but say you have a table of worldwide customers (I do have clients in 39 countries today), which exact locale are you going to use? I know this question has no answer (and that's the main grief I have with Unicode).

A workable perspective is to come up with a "decent if not perfect" way to deal with unaccenting and getting rid of the locale concept.

For those ready to cut some corners and to give a rough idea of what can be done easily if you're ready to live with some compromises (since *-no-* perfect solution exists), my C shared library implementing a large number of string and misc. functions (with both UTF-8 & UTF-16 interfaces) dealing with a weak form of "unaccented Unicode v5.1" is currently a 143Kb Win x86 DLL and runs reasonably fast.

JcD
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to