On Thu, Mar 19, 2009 at 09:52:55AM -0700, Noah Hart wrote:
> I've been reading and thinking about this topic for a while, and would
> like to add my thoughts.
> 
> I realize that we don't "vote" on features, but I feel that this type
> of idea has merit.
> 
> It is true, that SQLite has user defined collations, and a extension
> could be registered, but the problem with that is twofold:
> 
> Number 1, the database is no longer portable. The only solution to
> this is to include the functionality in the core.

Yes but, there is no single Unicode collation.  Collation is
language-specific, even when using Unicode.  Thus you're asking that
SQLite3 have a plethora of built-in Unicode collations.

And you'll probably want Unicode strings normalized for indexing and
comparison.  And...

And SQLite3 would no longer be light.  You can add Unicode collations
using the user-defined collation function and whatever Unicode collation
implementation you might have (e.g., ICU).

> Number 2, your platform may not support the sqlite3_create_collation
> interface. For example, Firefox now includes SQLite. Unfortunately,
> while Firefox supports user defined functions, their implementation
> does not support user defined collations.  

I'd call that a bug in Firefox.

> Someone commented that the US lives in a 7-bit world.  

But that's not true.  Even people who only read and write English can
barely get by with just US-ASCII (if nothing else a lot of webpages
would display as so many question marks if the browser didn't support
anything other than US-ASCII).  And there are plenty of multi-lingual
people in the U.S.

> This means that the other 6 billion people on the planet do not.

There are lots of non-Unicode character and code sets.  The rest of the
world is not necessarily in a better position than the English-speaking
world.  Unicode is a solution, and the best one at that.

> This creates a real problem for me.  I am writing a foreign language
> Firefox extension, and the issue of sorting is critical, since Firefox
> uses Unicode sorting, which does not "sort" (based on my rules)
> correctly.   This means I have no way to correct the sorting, except
> in the display routines.
> 
> That being said, I would not limit this feature to 8bit locales.  A

8-bit is so 1980s :)

> more general solution would be to design it around a sqlite_collation
> master table in the database. An application developer (not the SQLite
> team) would be responsible to define and populate their "user defined"
> collation.

It's more complex than you think.  You need to keep Unicode
normalization forms in mind and you need to deal with decomposed
characters no matter what (since not all future additions to Unicode
will include pre-composed forms, and NFC is closed to new pre-composed
forms anyways), which means multi-codepoint sequences need to be
accounted for in the collation.  You'd very quickly realize that it'd be
even simpler for you if SQLite3 just had built-in collations for all the
relevant languages.  And once more SQLite3 would no longer be light.
Perhaps when built with ICU SQLite3 could make it trivial to load any of
those collations.

Nico
-- 
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to