Re: [sqlite] Proposal for SQLite and non pure ASCII letters

Mail.sqlite Wed, 18 Mar 2009 00:37:13 -0700

- Please, let us try to bring down the discussion to the intended solution - a 
simple way to define and use a "user defined" collating for 8 bit ASCII 
characters! 
As said before, the proposal doesn't rely on locales. If a user needs a german 
collating sequence with sort order for phone-book, dictionary or german upper 
case, it's up to the user to supply a simple 256 byte string with the 
wanted/needed sort order for that index. It could be beneficial to all users 
with the need for special sorting requirements and almost no impact to cpu 
cycles, even on small systems.


There sould be many users with 8bit ASCII locales requirements that would love 
such an extension.
 
George

> -----Ursprüngliche Nachricht----- 
> Von: Roger Binns <rog...@rogerbinns.com> 
> An: General Discussion of SQLite Database <sqlite-users@sqlite.org> 
> Datum: 17-03-2009 23:55 
> Betreff: Re: [sqlite] Proposal for SQLite and non pure ASCII letters 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> > I'm know. But you can implement locale-dependent version for single 
> > language 
> > only. 
> 
> - From the German example, you can't even do that (name order is different
> than dictionary order).  I think we are agreed that the default SQLite
> implementation gets ASCII right and makes no attempt to deal
> specifically with non-ASCII locales.  The ICU extension gets all the
> locales as right as possible which is why it is huge and slow.
> 
> So that leaves a middle ground of somewhat right but being
> lighter-weight.  The problem with having that as a default part of
> SQLite is that it will be giving the wrong answers, but rarely will
> developers realise just how wrong.  And there are various tradeoffs that
> can be made between size/performance and correctness.
> 
> Consequently it should be documented just where each implementation
> stands.  The one you linked to is nice in that it documents in the code
> exactly how much bigger things become and you documented it being 4x
> faster than ICU.  But what isn't documented is how accurate it is.
> 
> If I took all the text from a leading newspaper in each locale, how well
> would it do.  Would it deal correctly with the name of the prime
> minister or the capital city?
> 
> An alternate approach would be working with the ICU folks to improve the
> size and performance of their library.  For example the code could be
> refactored to have fast paths for the most common conversions to improve
> performance, or be able to omit various lesser used locales to improve size.
> 
> Roger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> 
> iEYEARECAAYFAknAKkkACgkQmOOfHg372QT++QCfdWoE0nW5Cu2MbuZKzI49+hlx
> PSMAoLuN79Zh3dcHxKxS1L/QJOCGEpH8
> =kWCI
> -----END PGP SIGNATURE-----
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Proposal for SQLite and non pure ASCII letters

Reply via email to