Re: [sqlite] Collation advice

Igor Tandetnik Mon, 26 Aug 2013 11:22:40 -0700

On 8/26/2013 1:26 PM, _ph_ wrote:

Should "AD" + "ZV" really compare as a "A" + "DZ" digraph +"V" in the
respective language? I am not sure about the intended behavior, but it seems
strange. (OTOH, language. It's always strange.)


In Hungarian, yes, that's what happens.

Anyway, I would definitely  unicode-normalize the strings *before* putting
them into the database. You might avoid the special handling for the
digraphs if you normalize /towards/ the digraph code points: only strings
actually containing digraphs would escape your optimization.

There are no separate code points one could normalize to. Theselanguages use normal ASCII letters, but sorting is more complex thanletter-by-letter comparison. That's not all that unusual: even inEnglish, you might want to sort Muenster and Münster next to each other.

By the way, the correct name for such sequences is not "digraphs", but"contractions": http://www.unicode.org/reports/tr10/#Contractions

--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Collation advice

Reply via email to