Re: [sqlite] Proposal for SQLite and non pure ASCII letters

Roger Binns Tue, 17 Mar 2009 15:55:18 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I'm know. But you can implement locale-dependent version for single language 
> only.


- From the German example, you can't even do that (name order is different
than dictionary order).  I think we are agreed that the default SQLite
implementation gets ASCII right and makes no attempt to deal
specifically with non-ASCII locales.  The ICU extension gets all the
locales as right as possible which is why it is huge and slow.

So that leaves a middle ground of somewhat right but being
lighter-weight.  The problem with having that as a default part of
SQLite is that it will be giving the wrong answers, but rarely will
developers realise just how wrong.  And there are various tradeoffs that
can be made between size/performance and correctness.

Consequently it should be documented just where each implementation
stands.  The one you linked to is nice in that it documents in the code
exactly how much bigger things become and you documented it being 4x
faster than ICU.  But what isn't documented is how accurate it is.

If I took all the text from a leading newspaper in each locale, how well
would it do.  Would it deal correctly with the name of the prime
minister or the capital city?

An alternate approach would be working with the ICU folks to improve the
size and performance of their library.  For example the code could be
refactored to have fast paths for the most common conversions to improve
performance, or be able to omit various lesser used locales to improve size.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknAKkkACgkQmOOfHg372QT++QCfdWoE0nW5Cu2MbuZKzI49+hlx
PSMAoLuN79Zh3dcHxKxS1L/QJOCGEpH8
=kWCI
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Proposal for SQLite and non pure ASCII letters

Reply via email to