Re: [sqlite] How to specify collating sequences in an expression.

Dennis Cote Thu, 01 Feb 2007 11:08:01 -0800

[EMAIL PROTECTED] wrote:

A collating sequence is really a specification on how comparison
operators work on strings.

This is a bit of a simplification, at least as far as the SQL:1999standard goes.

You can have arbitrary collating sequences in SQLite.  But for
the sake of this discussion, lets focus on just two:  BINARY
which is the default collating sequence, and NOCASE which does
case insensitive comparisons (on US-ASCII).

Consider this example table:

   CREATE TABLE t1(
     x TEXT COLLATE BINARY,
     y TEXT COLLATE NOCASE
   );
   INSERT INTO t1 VALUES('hello','Hello');

The problem with all of the above is that the collating sequence
on a column is specified when the column is declared and cannot

be changed. What many people would like to have is some way tospecify a different collating sequence to override the default

collating sequence for a single comparison.  The question is, what
should the syntax be.  Here are some ideas:

    SELECT * FROM t1 WHERE x='HELLO' COLLATE NOCASE;

This is the syntax that the SQL:1999 standard specifies. It would alsoallow this


        SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO';

or even this

        SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO' COLLATE NOCASE

Here the = operator has some added syntax at the end.  There some
parsing abiguities with this approach, but they can be dealt with
just as they are with the EXCEPT clause on a LIKE operator.  The
comparison operator syntax would be like this:

    <expr> <operator> <expr> [COLLATE <id>]

To accept the standard syntax this would need to be something like:

   <expr> [COLLATE <id>] <operator> <expr> [COLLATE <id>]

Questions for SQLite community members:

  (1)  Is there some standard SQL syntax that I do not know about
       for doing this kind of thing?

  (2)  How do other SQL engines do this kind of thing?

  (3)  Do you have any additional (hopefully better) ideas for
       specifying alternative collating sequences on individual
       comparison operators.

  (4)  What syntax do you prefer?

I would *strongly* recommend adopting the SQL:1999 standard syntax forcompatibilty with other database systems, now and in the future.

In the standard character strings have a couple of attributes, acharacter set and a collation. SQLite does not support multiplecharacter sets, so we can ignore that attribute. This leaves each stringwith a collation attribute. This attribute can be specified explicitlyin data type clause of a column definition, or in the data type clauseof a cast expression, or directly with an explicit COLLATE clause aftera string expression, even on a string literal.


   create table t (column1 text COLLATE <name>)
   CAST( <src> AS text COLLATE <name>)
   column1 COLLATE <name>
   'a string' COLLATE <name>

Then the standard has set of rules that specifies the collation of thecharacter string resulting from operations, like substring andconcatenation based on the collation of the operand strings. This isdetermined by another attribute of a string, its coercibility, whichindicates if a string has a rigid collation, is using a collation fromanother source, or if it can be coerced to use a collation from anothersource. These rules are specified for functions with one or two stringarguments like substring, or concatenation, and for comparisons. Therules used to assign a string's coercibility are fairly simple:

If you provide an explicit collate clause the coercibility isExplicit and the collation is the one specified by the collation clause.If the string is a column reference the coercibility is Implicit andthe collation is the one specified by the column definition.If the string is a value other than a column reference (a literal orparameter value) the coercibility is Coercible and the collation is thedefault collation.

The standard then defines three table, one for the result of a monadicfunction, one for the result of a dyadic function, and one for thecollation used for a comparison. These tables are basically common sense.

For monadic functions the output has the same coercibility and collationas the input.

For dyadic function the result are much as expected; Implicit overridesCoercible and produces Implicit, Explicit overrides Coercible andImplicit and produces Explicit. There are two corner cases though. Thefirst is a function with two Implicit arguments that have differentcollation, which is defined to produce a value with no collation. Thesecond is a function with two arguments with different Explicitcollations, which is defined as invalid syntax.

For comparisons the collation that is used is the default if bothoperands are Coercible. An Explicit collation is used if either argumentis Explicit, unless they are both Explicit but specify differentcollations, this is defined as invalid syntax. An Implicit collationwill be used unless both arguments are Implicit with different collationvalues, this is also defined as invalid syntax.

The rules also specify how to propogate the NULL collation values thatresult from using different Implicit collations in a dyadic function.

These rules may seem a little convoluted, but they are standardized, andclearly define the behavior.

I think it's important for SQLite to try to maintain as much SQLstandard compatibility as possible. This would be another step in thatdirection.


Dennis Cote





-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] How to specify collating sequences in an expression.

Reply via email to