On Wed, May 14, 2014 at 1:35 PM, Jan Slodicka <j...@resco.net> wrote:

> Simon Slavin-3 wrote
> > On 13 May 2014, at 5:21pm, Constantine Yannakopoulos wrote:
> >
> >> ​This is very interesting Jan. The only way this could fail is if the
> >> collation implementation does something funny if it encounters this
> >> character​, e.g. choose to ignore it when comparing.
> >
> > That cuts out a very large number of collations.  The solution works fine
> > for any collation which orders strings according to Unicode order.  But
> > the point of creating a correlation is that you don't want that order.
> >
> > Simon.
>
> Simon, I think that the most frequent point of making a collation is to get
> the Unicode order. At the bare minimum adding LIKE optimization to the ICU
> Sqlite extension would make sense, the savings are really huge.
>

There could be a flag in sqlite3_create_collation_v2()'s TextRep argument,
much like the flag SQLITE_DETERMINISTIC of sqlite3_create_function() that
will flag the collation as a "unicode text" collation. If this flag is set,
the engine can perform the LIKE optimization for these collations using the
U+10FFFD idea to construct an upper limit for the range as it has been
described in previous posts. Since this flag is not present in existing
calls to sqlite3_create_collation_v2() the change will be
backward-compatible.

Either this or the already mentioned idea of giving the ability to manually
specify lower and upper bounds for the LIKE optimization, perhaps by means
of a callback in a hypothetical sqlite3_create_collation_v3() variant.

And by the way, "unicode text" collations include all "strange" collations
like the one of accent insensitivity and mixed codepage I described in my
original post. And I would expect these to be about 95% of all custom coded
collations.

--Constantine
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to