On 17 Nov 2009, at 10:05pm, Beau Wilkinson wrote: > I think a better approach (to the design of Unicode) would have been for > Spanish and German (for instance) to share absolutely nothing in the encoding > standards. Each language ought to have its own little span of letters, > immortalized into the standard in correct order-of-collation, with no sharing > of "code points," "characters," or anything else.
This is how at least two unicode libraries I know of work internally. For all pieces of text they encounter they infer which language(s) this text represents. They then use whatever sort order is appropriate to that language. This requires you to assign language(s) to a string as the string is typed in, so that moving a database from one country to another does not change the collation order. If each piece of text is in the same language this does not require any space (well, just one 'default language' stored with the entire database file) but sometimes one string includes text from more than one country, e.g. switching from Roman to Japanese and back again. The only advantage to this system is that it works, and it works consistently. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users