[EMAIL PROTECTED] wrote:
A collating sequence is really a specification on how comparison
operators work on strings.
This is a bit of a simplification, at least as far as the SQL:1999 standard goes.

You can have arbitrary collating sequences in SQLite.  But for
the sake of this discussion, lets focus on just two:  BINARY
which is the default collating sequence, and NOCASE which does
case insensitive comparisons (on US-ASCII).

Consider this example table:

   CREATE TABLE t1(
     x TEXT COLLATE BINARY,
     y TEXT COLLATE NOCASE
   );
   INSERT INTO t1 VALUES('hello','Hello');

The problem with all of the above is that the collating sequence
on a column is specified when the column is declared and cannot
be changed. What many people would like to have is some way to specify a different collating sequence to override the default
collating sequence for a single comparison.  The question is, what
should the syntax be.  Here are some ideas:

    SELECT * FROM t1 WHERE x='HELLO' COLLATE NOCASE;

This is the syntax that the SQL:1999 standard specifies. It would also allow this

        SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO';

or even this

        SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO' COLLATE NOCASE


Here the = operator has some added syntax at the end.  There some
parsing abiguities with this approach, but they can be dealt with
just as they are with the EXCEPT clause on a LIKE operator.  The
comparison operator syntax would be like this:

    <expr> <operator> <expr> [COLLATE <id>]

To accept the standard syntax this would need to be something like:

   <expr> [COLLATE <id>] <operator> <expr> [COLLATE <id>]

Questions for SQLite community members:

  (1)  Is there some standard SQL syntax that I do not know about
       for doing this kind of thing?

  (2)  How do other SQL engines do this kind of thing?

  (3)  Do you have any additional (hopefully better) ideas for
       specifying alternative collating sequences on individual
       comparison operators.

  (4)  What syntax do you prefer?

I would *strongly* recommend adopting the SQL:1999 standard syntax for compatibilty with other database systems, now and in the future.

In the standard character strings have a couple of attributes, a character set and a collation. SQLite does not support multiple character sets, so we can ignore that attribute. This leaves each string with a collation attribute. This attribute can be specified explicitly in data type clause of a column definition, or in the data type clause of a cast expression, or directly with an explicit COLLATE clause after a string expression, even on a string literal.

   create table t (column1 text COLLATE <name>)
   CAST( <src> AS text COLLATE <name>)
   column1 COLLATE <name>
   'a string' COLLATE <name>

Then the standard has set of rules that specifies the collation of the character string resulting from operations, like substring and concatenation based on the collation of the operand strings. This is determined by another attribute of a string, its coercibility, which indicates if a string has a rigid collation, is using a collation from another source, or if it can be coerced to use a collation from another source. These rules are specified for functions with one or two string arguments like substring, or concatenation, and for comparisons. The rules used to assign a string's coercibility are fairly simple:

If you provide an explicit collate clause the coercibility is Explicit and the collation is the one specified by the collation clause. If the string is a column reference the coercibility is Implicit and the collation is the one specified by the column definition. If the string is a value other than a column reference (a literal or parameter value) the coercibility is Coercible and the collation is the default collation.

The standard then defines three table, one for the result of a monadic function, one for the result of a dyadic function, and one for the collation used for a comparison. These tables are basically common sense.

For monadic functions the output has the same coercibility and collation as the input.

For dyadic function the result are much as expected; Implicit overrides Coercible and produces Implicit, Explicit overrides Coercible and Implicit and produces Explicit. There are two corner cases though. The first is a function with two Implicit arguments that have different collation, which is defined to produce a value with no collation. The second is a function with two arguments with different Explicit collations, which is defined as invalid syntax.

For comparisons the collation that is used is the default if both operands are Coercible. An Explicit collation is used if either argument is Explicit, unless they are both Explicit but specify different collations, this is defined as invalid syntax. An Implicit collation will be used unless both arguments are Implicit with different collation values, this is also defined as invalid syntax.

The rules also specify how to propogate the NULL collation values that result from using different Implicit collations in a dyadic function.

These rules may seem a little convoluted, but they are standardized, and clearly define the behavior.

I think it's important for SQLite to try to maintain as much SQL standard compatibility as possible. This would be another step in that direction.

Dennis Cote




-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to