On 17 Nov 2009, at 10:05pm, Beau Wilkinson wrote:

> I think a better approach (to the design of Unicode) would have been for 
> Spanish and German (for instance) to share absolutely nothing in the encoding 
> standards. Each language ought to have its own little span of letters, 
> immortalized into the standard in correct order-of-collation, with no sharing 
> of "code points," "characters," or anything else.

This is how at least two unicode libraries I know of work internally.  For all 
pieces of text they encounter they infer which language(s) this text 
represents.  They then use whatever sort order is appropriate to that language. 
 This requires you to assign language(s) to a string as the string is typed in, 
so that moving a database from one country to another does not change the 
collation order.  If each piece of text is in the same language this does not 
require any space (well, just one 'default language' stored with the entire 
database file) but sometimes one string includes text from more than one 
country, e.g. switching from Roman to Japanese and back again.  The only 
advantage to this system is that it works, and it works consistently.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to