Re: [sqlite] Collation advice

Igor Tandetnik Tue, 27 Aug 2013 06:39:35 -0700

On 8/27/2013 6:50 AM, Jan Slodicka wrote:

That's not all that unusual: even in English, you might want to sort
Muenster and Münster next to each other.


Thanks, Igor. Do you know more? Do you consider ascii comparison too
dangerous?

At one point, we did in our project the same thing you are trying now:check if both strings are pure ASCII then compare them the fast way(equivalent to memcpy, though we didn't use it but did the checking andcomparison together, in one pass); otherwise fall back to theOS-provided locale-sensitive comparison.

In the end, we discovered ICU: it manages to be much faster than the OScomparisons (not exactly surprising), and even slightly faster than ourhand-written check-and-compare-ASCII loop, while being correct for alllocales. Ours is a desktop application, not resource constrained, sobundling ICU with it was not a problem.

Here's the summary of all the cases I know of where simple ASCIIcomparison does the wrong thing (which doesn't mean there aren't othersI don't know of):

- Contractions in various Latin-script-using Eastern-European languages(like Hungarian) you are already aware of.


- Several contractions in Welsh:
http://en.wikipedia.org/wiki/Welsh_language#Orthography

- German phonebook sort, that puts AE between A and B, OE between O andP, and UE between U and V. German defines two sorts, called "dictionary"and "phonebook", which differ only in whether these contractions areused. On Windows, the user can configure which sort to use.

- Spanish traditional sort (as opposed to modern sort) puts CH between Cand D, and LL between L and M. No longer used for anything but theacademic linguistic studies, can be safely ignored.

- Finnish treats W as a variant of V (it's considered a secondarydistinction, like that between A and Á).


- Lithuanian puts Y between I and J

--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Collation advice

Reply via email to