On Thu, Mar 19, 2009 at 09:52:55AM -0700, Noah Hart wrote: > I've been reading and thinking about this topic for a while, and would > like to add my thoughts. > > I realize that we don't "vote" on features, but I feel that this type > of idea has merit. > > It is true, that SQLite has user defined collations, and a extension > could be registered, but the problem with that is twofold: > > Number 1, the database is no longer portable. The only solution to > this is to include the functionality in the core.
Yes but, there is no single Unicode collation. Collation is language-specific, even when using Unicode. Thus you're asking that SQLite3 have a plethora of built-in Unicode collations. And you'll probably want Unicode strings normalized for indexing and comparison. And... And SQLite3 would no longer be light. You can add Unicode collations using the user-defined collation function and whatever Unicode collation implementation you might have (e.g., ICU). > Number 2, your platform may not support the sqlite3_create_collation > interface. For example, Firefox now includes SQLite. Unfortunately, > while Firefox supports user defined functions, their implementation > does not support user defined collations. I'd call that a bug in Firefox. > Someone commented that the US lives in a 7-bit world. But that's not true. Even people who only read and write English can barely get by with just US-ASCII (if nothing else a lot of webpages would display as so many question marks if the browser didn't support anything other than US-ASCII). And there are plenty of multi-lingual people in the U.S. > This means that the other 6 billion people on the planet do not. There are lots of non-Unicode character and code sets. The rest of the world is not necessarily in a better position than the English-speaking world. Unicode is a solution, and the best one at that. > This creates a real problem for me. I am writing a foreign language > Firefox extension, and the issue of sorting is critical, since Firefox > uses Unicode sorting, which does not "sort" (based on my rules) > correctly. This means I have no way to correct the sorting, except > in the display routines. > > That being said, I would not limit this feature to 8bit locales. A 8-bit is so 1980s :) > more general solution would be to design it around a sqlite_collation > master table in the database. An application developer (not the SQLite > team) would be responsible to define and populate their "user defined" > collation. It's more complex than you think. You need to keep Unicode normalization forms in mind and you need to deal with decomposed characters no matter what (since not all future additions to Unicode will include pre-composed forms, and NFC is closed to new pre-composed forms anyways), which means multi-codepoint sequences need to be accounted for in the collation. You'd very quickly realize that it'd be even simpler for you if SQLite3 just had built-in collations for all the relevant languages. And once more SQLite3 would no longer be light. Perhaps when built with ICU SQLite3 could make it trivial to load any of those collations. Nico -- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users