Re: [sqlite] BETWEEN and explicit collation assignment

Roman Fleysher Thu, 22 Aug 2013 16:00:06 -0700

Dear Igor,

First, I do not expect any changes to SQLite to be made as a result of this 
discussion. You are right, the discussion is largely philosophical. (I do not 
imply "philosophical" means "impractical", which are sometimes equated.) But 
the questions you ask are also philosophical: for is there something that can 
be done in C++ that can not be done in Fortran or Assembler? I am sure very 
little. Nevertheless, C++ came to exist. Thus these questions driven by 
practical considerations --- I understand --- must be replaced by some others. 
For example: can the desired action be expressed more clearly, elegantly, 
fool-proof, efficiently, etc in the language. I believe these sorts of 
questions led to development of C++, Java, etc not what could or could not be 
done in Fortran and Assembler. These are the questions, as I understand, Simon 
is asking. Can collate rule be expressed in such a way that it is clear it 
modifies the operation not the value? Perhaps, I should not speak for Simon, 
this is 
 my interpretation of his writing.

Because COLLATE should be a modifier to comparisons, then COLLATE should be a 
modifier to index. We agree here, I think. I do not know if it is possible to 
build two indices on the same column using two different collate rules. 
Assuming comparison should specify collate rule (implicitly or explicitly 
without which comparison is impossible) the same rule will identify the 
corresponding index. If available, this index would be used. Thus the collation 
should not be picked up from the column, it must be picked up from the 
comparison. It is possible to adopt column's collate rule, if it is unique, to 
be used by default if no rule is explicitly specified, as you would like to 
see. It is then clear that this adaptation is a shortcut which should raise 
FAIL when columns with different collations are compared without explicit rule. 

As a matter of taste, I prefer to tell exactly what I want to happen, rather 
than to have software figure out what I meant. I can not be sure it will figure 
what I wanted. Historically SQL statements were typed by operators, and typos 
were huge issue. Thus shortcuts were introduced to mitigate. This is in the 
past, I believe. We do not mind typing long names of variables in C++ code, we 
want the code to be self-documenting. For me having x=y and y=x comparisons 
produce different results is complex even though I know how to decipher. I do 
not mind typing an extra word -- a modifier for comparison -- to make clear 
what I want to happen. I do not think this increases complexity. Complexity is 
sometimes equated to the lack of clarity. In fact, it is the consequence. 

Roman

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [i...@tandetnik.org]
Sent: Thursday, August 22, 2013 5:26 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] BETWEEN and explicit collation assignment

On 8/22/2013 4:06 PM, Roman Fleysher wrote:
> Logically, I agree with Simon, collate modifier in table definition describes 
> how indices should be built, nothing more.

But for what purpose are indexes built? Isn't it for the purpose of
being used to speed up queries? Wouldn't you expect a query like

select count(*) from t where x = 'a';

to actually use an index on t(x), when available? And if you do, doesn't
that mean that the comparison (x = 'a') should use the same collation as
an index on t(x), that is, the collation associated with column x? And
if so, then you are right back to the idea that the collation used by
the comparison operator should, at least sometimes, be inferred from the
collation associated with its operands. Therefore, the COLLATE clause in
the table definition should apply not just to building indexes, but also
to the behavior of comparisons. QED.

Where is the flaw in this chain of reasoning, in your opinion?

> Comparisons, as in these examples should specify which collation is to be 
> used and if it is different from one used for indexing, then yes, Igor, index 
> can not be used. I believe this is the present behavior already.

Obviously, if you explicitly specify a collation for a comparison that
doesn't match that of an index, then the index cannot be used. The
question is, if you do *not* explicitly specify a collation for a
comparison, as in example above - would you expect an index to be used?
Personally, I would, and I rather like the fact that SQLite currently
does use an index in this case. Simon, on the other hand, appears to be
saying that he would rather the index not be used, that the comparison
be done using BINARY collation (though he never answered my direct
question about it).

> I do not think Simon asks to change the way collations work. He is asking to 
> make clear how they work. I think Simon is asking for clarity so that error 
> like "(x collate A) = (y collate B)" can not take place.

What to do with the case of (x = y) where x and y are two columns with
different collations, then? How do you ensure that it cannot take place?
And if you can't prevent it, then you need to do *something* in this
case, and then you can do the exact same thing with "(x collate A) = (y
collate B)" case - no need to invent new syntax or machinery to try and
avoid it.

> This is similar to JOIN operation, which requires two tables to have common 
> column.

JOIN operation requires no such thing. I don't understand this analogy.

> Using this analogy, I would write:
>
> select * from t WHERE x = y USING NOCASE

How is this different from

select * from t WHERE x = y COLLATE NOCASE

I only see a superficial difference in the keyword used. What's the
improvement?

I *do* understand what you and Simon are suggesting. I *do* understand
that you want some kind of syntax that would hang a collation onto the
operator, rather than its operands, for certain philosophical and/or
aesthetic reasons. My point is, this would have to be done *in addition
to*, not *instead of*, the existing mechanism (because of the whole
"comparison must automatically pick the collation off the column in
order to use the index" thing). So it's strictly an added complexity.

The burden is then on the authors of the proposal to demonstrate the
benefits of the feature you are proposing, and to show that said
benefits do in fact outweigh the cost of this added complexity. So I
would ask you the same question I asked Simon: what exactly are the
benefits of your approach? Is there something that can be done your way
but can't be done the current way? Is there some bad outcome that's
possible the current way but prevented your way? Do you just prefer your
(as yet unspecified) syntax on purely aesthetic grounds?
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] BETWEEN and explicit collation assignment

Reply via email to