Re: [sqlite] Unicode collation

Dan Kennedy Sat, 23 Jun 2007 02:30:03 -0700

On Sat, 2007-06-23 at 10:56 +0200, Jiri Hajek wrote:
> > The reason is as you've surmised. Not all systems have full unicode
> > support (I'm not sure, but if I had to guess, I would say very few
> > systems do). Including an implementation with SQLite would bloat
> > the library to at least several times it's current size.
> 
> I know really well only Windows where it's only about using
> CompareString() function, i.e. almost no code in SQLite.
> 
> > Another reason is that it's a pretty complex topic. Supporting
> > most European languages would probably be possible without too
> > much trouble, but once you get into Asian and Middle-eastern
> > languages I think it's much harder.
> 
> As far as I know, there wouldn't be any complexity on SQLite's side -
> it's only about calling proper methods (be it CompareString() on
> Windows or ICU methods elsewhere), i.e. again pretty much no code
> needed in SQLite.
> 
> > There is an extension packaged with SQLite sources that uses the
> > ICU library to provide locale dependent collation sequences and
> > case folding. See here for details:
> 
> ICU is nice, but pretty large. Since I develop for Windows, I'd rather
> not distribute it with my application considering that this is alredy
> provided in Windows in reasonable quality.
> 
> Anyway, I guess that the question isn't mainly about how to implement
> this in SQLite, but about the problem that SQLite doesn't define any
> standard how to handle Unicode. Currently, any application that needs
> to work with Unicode data has to define its own collation and name it
> 'tr_TR', 'turkish', 'MyTurkish', or any other way. The result is a big
> mess and no chance of opening SQLite database in other application
> than it was designed for.
>
> So, why don't we (or you - SQLite developers) define how to name
> collations (e.g. that 'tr_TR', 'en_AU', ... standard?) and then every
> database complying this would be perfectly portable.


That's not too bad an idea. One thing to watch out for is that if
the definition of a collation sequence that you have used to create
an index varies even slightly from machine to machine (say from 
win98 to windows vista), you are headed for database corruption.

For that reason I'd be a bit reluctant to encourage people to use
more than one implementation of a named collation sequence. 

Dan.




-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Unicode collation

Reply via email to