[EMAIL PROTECTED] wrote:
A collating sequence is really a specification on how comparison
operators work on strings.
This is a bit of a simplification, at least as far as the SQL:1999
standard goes.
You can have arbitrary collating sequences in SQLite. But for
the sake of this discussion, lets focus on just two: BINARY
which is the default collating sequence, and NOCASE which does
case insensitive comparisons (on US-ASCII).
Consider this example table:
CREATE TABLE t1(
x TEXT COLLATE BINARY,
y TEXT COLLATE NOCASE
);
INSERT INTO t1 VALUES('hello','Hello');
The problem with all of the above is that the collating sequence
on a column is specified when the column is declared and cannot
be changed. What many people would like to have is some way to
specify a different collating sequence to override the default
collating sequence for a single comparison. The question is, what
should the syntax be. Here are some ideas:
SELECT * FROM t1 WHERE x='HELLO' COLLATE NOCASE;
This is the syntax that the SQL:1999 standard specifies. It would also
allow this
SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO';
or even this
SELECT * FROM t1 WHERE x COLLATE NOCASE = 'HELLO' COLLATE NOCASE
Here the = operator has some added syntax at the end. There some
parsing abiguities with this approach, but they can be dealt with
just as they are with the EXCEPT clause on a LIKE operator. The
comparison operator syntax would be like this:
<expr> <operator> <expr> [COLLATE <id>]
To accept the standard syntax this would need to be something like:
<expr> [COLLATE <id>] <operator> <expr> [COLLATE <id>]
Questions for SQLite community members:
(1) Is there some standard SQL syntax that I do not know about
for doing this kind of thing?
(2) How do other SQL engines do this kind of thing?
(3) Do you have any additional (hopefully better) ideas for
specifying alternative collating sequences on individual
comparison operators.
(4) What syntax do you prefer?
I would *strongly* recommend adopting the SQL:1999 standard syntax for
compatibilty with other database systems, now and in the future.
In the standard character strings have a couple of attributes, a
character set and a collation. SQLite does not support multiple
character sets, so we can ignore that attribute. This leaves each string
with a collation attribute. This attribute can be specified explicitly
in data type clause of a column definition, or in the data type clause
of a cast expression, or directly with an explicit COLLATE clause after
a string expression, even on a string literal.
create table t (column1 text COLLATE <name>)
CAST( <src> AS text COLLATE <name>)
column1 COLLATE <name>
'a string' COLLATE <name>
Then the standard has set of rules that specifies the collation of the
character string resulting from operations, like substring and
concatenation based on the collation of the operand strings. This is
determined by another attribute of a string, its coercibility, which
indicates if a string has a rigid collation, is using a collation from
another source, or if it can be coerced to use a collation from another
source. These rules are specified for functions with one or two string
arguments like substring, or concatenation, and for comparisons. The
rules used to assign a string's coercibility are fairly simple:
If you provide an explicit collate clause the coercibility is
Explicit and the collation is the one specified by the collation clause.
If the string is a column reference the coercibility is Implicit and
the collation is the one specified by the column definition.
If the string is a value other than a column reference (a literal or
parameter value) the coercibility is Coercible and the collation is the
default collation.
The standard then defines three table, one for the result of a monadic
function, one for the result of a dyadic function, and one for the
collation used for a comparison. These tables are basically common sense.
For monadic functions the output has the same coercibility and collation
as the input.
For dyadic function the result are much as expected; Implicit overrides
Coercible and produces Implicit, Explicit overrides Coercible and
Implicit and produces Explicit. There are two corner cases though. The
first is a function with two Implicit arguments that have different
collation, which is defined to produce a value with no collation. The
second is a function with two arguments with different Explicit
collations, which is defined as invalid syntax.
For comparisons the collation that is used is the default if both
operands are Coercible. An Explicit collation is used if either argument
is Explicit, unless they are both Explicit but specify different
collations, this is defined as invalid syntax. An Implicit collation
will be used unless both arguments are Implicit with different collation
values, this is also defined as invalid syntax.
The rules also specify how to propogate the NULL collation values that
result from using different Implicit collations in a dyadic function.
These rules may seem a little convoluted, but they are standardized, and
clearly define the behavior.
I think it's important for SQLite to try to maintain as much SQL
standard compatibility as possible. This would be another step in that
direction.
Dennis Cote
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------