Re: [sqlite] BETWEEN and explicit collation assignment

Roman Fleysher Fri, 23 Aug 2013 10:10:58 -0700

You are correct, James, comparison depends on types. However, lets say we have 
suitcase and we want to test if it is bigger than the allowed limit, i.e. we 
have a biggest allowed suitcase to compare against. How do we answer: is this 
one bigger than the standard? Well, if this is a check-in bag, it has to be 
lighter than 50 lb, below a weight limit. If the suitcase is a carry-on it has 
to pass both weight limit and dimension limit. It is possible to cast the 
suitcase into carry-on subtype and conduct test there and to cast it to 
check-in type and test there too. But comparisons can be more complicated in 
general. If we have vectors, we may want to compare them by length (one 
sub-type of a vector), by one of its elements (one sub-type of vector per each 
element), by projection to a given vector (another sub-type), by angle that it 
makes with a vector (you guessed, another sub-type).

Thus, it is indeed possible to attach comparisons to types. This leads to 
enormous growth in number of types. Another solution is to realize that the 
comparator (the thing that makes comparisons) is actually an object of its own 
and it can be configured to apply different algorithms. Then, one does not need 
many  subtypes: one suitcase and one vector. This is, as I understand, the 
approach implemented in SQLite: algorithm is passed as an argument to the 
comparator along with the objects.

All to say something simple but not obvious: the rules of equality -- of any 
comparison -- are governed by the attribute(s) of the object relevant for 
comparison. What is relevant is not determined by the type of the object, but 
by the task at hand, in other words by the comparator.

Each column in a table is supposed to hold an attribute of an object (of the 
record -- the row). But strings, stored as attributes of the record, are 
objects of their own. Composite objects, composed of other objects -- 
characters. The characters  have upper and lower case, they can be printable 
and and non-printable, white spaces or not, bold and italic(?).  Therefore, 
comparing strings requires specification which of the attributes of those 
underlying objects are relevant. Collation rule is such a specification, an 
algorithm that tells comparator how to compare two strings -- objects of the 
same type. Type alone is not enough, if we want all types to fit human heads.

Roman

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of James K. Lowden [jklow...@schemamania.org]
Sent: Thursday, August 22, 2013 9:18 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On Thu, 22 Aug 2013 13:36:00 +0100
Simon Slavin <slav...@bigfraud.org> wrote:

> > 1.  where COLLATE( x AS NOCASE )
> >             between COLLATE( 'a' AS NOCASE )
> >             and COLLATE( 'b' AS NOCASE )
>
> Again, you are trying to apply a collation to a value.  And because
> BETWEEN takes three operands you are being given the opportunity to
> apply three collations for one condition.  Which is nonsense.  So the
> syntax shouldn't allow it or even suggest it would work.  You should
> be casting the 'BETWEEN', not the operands you're supplying to it.

There are two issues:

1.  Is collation a property of the value, its type, or the operation?
2.  Should the implied collation as currently implemented be retained,
or should it be replaced?

Before we can consider what syntax to use, we have to agree on what the
semantics are, on what is means to compare two things.  So let's
dispense with item #2 until we agree on #1.

Collation is not a property of the operation.  There is no other kind
of '=' than equality.  Equal is absolute; there is no modifier for it.

I understand your temptation.  I've used strcasecmp(3) a few times
myself.  But when you realize there's no such thing in SQLite or in the
relational model as an "operator modifier", you'll see the light.

Collation is not a property of the value.  Strings are just strings.
They have an *encoding*, necessarily, but not a collation.

Collation therefore must, by process of elimination, be a property of
the type.  And a good thing, too, because it's a property of the
column, and a column is a type.

Forget indexes; they're a feature of the implementation.  Think
logically, *semantically*: when are two things equal?  Two values in a
column either are or are not equal.  What determines what "equal"
*means*?  The type!

The question of whether

        A = B

is true or false rests *not* on "how they are compared" but on *what*
they are.  The same is true for

        'abc' = NAME

To make that obvious, let's take a little mathematical excursion.

Ask yourself how this is evaluated:

        where 1 = '1'

SQL is logical (or so we like to think), and we're being asked to
compare a number to a string, an apple to an orange.  What to do?

There are two ways to treat that: report a type error, or perform a
type conversion.  Currently, as it happens, SQLite does something
rather surprising:

sqlite> select 1 = '1', 1 * '1', 1 - '-1', 1 + '1', 1 / '1', 0 = '1';
1 = '1'     1 * '1'     1 - '-1'    1 + '1'     1 / '1'     0 = '1'
----------  ----------  ----------  ----------  ----------  ----------
0           1           2           2           1           0

As explained in http://www.sqlite.org/datatype3.html, "mathematical"
operators have different rules from "comparison" operators.
Mathematical binary operators demand two operands of the same type, and
the conversion rule is that they "cast both operands to the NUMERIC
storage class".  No such rule exists for logical operators; afaict the
comparison of strings to numbers is specified.  On the evidence, no
string is equal to any number, and no implicit conversion is defined.

Note well: the operand didn't reach out and bang the operator on the
head to modify its behavior.  The operator converted the operand to a
comparable type.  Or not, as the case may be.

All to say something simple but not obvious: the rules of equality --
of any comparision -- are governed by the type being compared.

Now keep that in mind as you think about strings and their collations.
when we evaluate:

        where 'abc' = NAME

we must know:

1.  The collation of NAME (not of the values in NAME)
2.  The collation of 'abc'
3.  The rules of conversion

Dr. Hipp told us, "If X does not have a specified collation, then the
collation of Y is used instead", which I would express as, "if only one
of two types has the default collation, it is converted to the other
type".  Collations aren't "used"; they're *honored*!

By that rule, if NAME has the default collation, the two operands are
of the same type and can be compared.  Else, if NAME's type uses a
collation other than the default, 'abc' is converted to that type and
the comparison is performed.

That is the easiest way -- and, by the grace of Codd, the right way --
to think about it.  Get your types in order, and then compare.  It
works for any comparison, and any type, and our friend equality remains
unbothered.  And unmodified.

Regards,

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

Reply via email to