> Anyway, I would definitely unicode-normalize the strings before putting them into the database. You might avoid the special handling for the digraphs if you normalize towards the digraph code points: only strings actually containing digraphs would escape your optimization.
Tough stuff. Although I tried to learn something about the Unicode, I am no expert. Anyway, the idea is to handle what can be handled safely and fast and let the OS do the difficult things. We are here at mercy of the OS, but I believe they handle correctly 100% of common strings and 99.9999% of less frequent strings. Correct me if I am wrong, please. In fact, our Android version uses Unicode ICU library, but we want to avoid this in general. (Performance, size, maintenance.) As is Unicode testing concerned, I'll start a new post. Regards, Jan Slodicka -- View this message in context: http://sqlite.1065341.n5.nabble.com/Collation-advice-tp70668p70696.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users