Dear Igor, First, I do not expect any changes to SQLite to be made as a result of this discussion. You are right, the discussion is largely philosophical. (I do not imply "philosophical" means "impractical", which are sometimes equated.) But the questions you ask are also philosophical: for is there something that can be done in C++ that can not be done in Fortran or Assembler? I am sure very little. Nevertheless, C++ came to exist. Thus these questions driven by practical considerations --- I understand --- must be replaced by some others. For example: can the desired action be expressed more clearly, elegantly, fool-proof, efficiently, etc in the language. I believe these sorts of questions led to development of C++, Java, etc not what could or could not be done in Fortran and Assembler. These are the questions, as I understand, Simon is asking. Can collate rule be expressed in such a way that it is clear it modifies the operation not the value? Perhaps, I should not speak for Simon, this is my interpretation of his writing.
Because COLLATE should be a modifier to comparisons, then COLLATE should be a modifier to index. We agree here, I think. I do not know if it is possible to build two indices on the same column using two different collate rules. Assuming comparison should specify collate rule (implicitly or explicitly without which comparison is impossible) the same rule will identify the corresponding index. If available, this index would be used. Thus the collation should not be picked up from the column, it must be picked up from the comparison. It is possible to adopt column's collate rule, if it is unique, to be used by default if no rule is explicitly specified, as you would like to see. It is then clear that this adaptation is a shortcut which should raise FAIL when columns with different collations are compared without explicit rule. As a matter of taste, I prefer to tell exactly what I want to happen, rather than to have software figure out what I meant. I can not be sure it will figure what I wanted. Historically SQL statements were typed by operators, and typos were huge issue. Thus shortcuts were introduced to mitigate. This is in the past, I believe. We do not mind typing long names of variables in C++ code, we want the code to be self-documenting. For me having x=y and y=x comparisons produce different results is complex even though I know how to decipher. I do not mind typing an extra word -- a modifier for comparison -- to make clear what I want to happen. I do not think this increases complexity. Complexity is sometimes equated to the lack of clarity. In fact, it is the consequence. Roman ________________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Igor Tandetnik [i...@tandetnik.org] Sent: Thursday, August 22, 2013 5:26 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] BETWEEN and explicit collation assignment On 8/22/2013 4:06 PM, Roman Fleysher wrote: > Logically, I agree with Simon, collate modifier in table definition describes > how indices should be built, nothing more. But for what purpose are indexes built? Isn't it for the purpose of being used to speed up queries? Wouldn't you expect a query like select count(*) from t where x = 'a'; to actually use an index on t(x), when available? And if you do, doesn't that mean that the comparison (x = 'a') should use the same collation as an index on t(x), that is, the collation associated with column x? And if so, then you are right back to the idea that the collation used by the comparison operator should, at least sometimes, be inferred from the collation associated with its operands. Therefore, the COLLATE clause in the table definition should apply not just to building indexes, but also to the behavior of comparisons. QED. Where is the flaw in this chain of reasoning, in your opinion? > Comparisons, as in these examples should specify which collation is to be > used and if it is different from one used for indexing, then yes, Igor, index > can not be used. I believe this is the present behavior already. Obviously, if you explicitly specify a collation for a comparison that doesn't match that of an index, then the index cannot be used. The question is, if you do *not* explicitly specify a collation for a comparison, as in example above - would you expect an index to be used? Personally, I would, and I rather like the fact that SQLite currently does use an index in this case. Simon, on the other hand, appears to be saying that he would rather the index not be used, that the comparison be done using BINARY collation (though he never answered my direct question about it). > I do not think Simon asks to change the way collations work. He is asking to > make clear how they work. I think Simon is asking for clarity so that error > like "(x collate A) = (y collate B)" can not take place. What to do with the case of (x = y) where x and y are two columns with different collations, then? How do you ensure that it cannot take place? And if you can't prevent it, then you need to do *something* in this case, and then you can do the exact same thing with "(x collate A) = (y collate B)" case - no need to invent new syntax or machinery to try and avoid it. > This is similar to JOIN operation, which requires two tables to have common > column. JOIN operation requires no such thing. I don't understand this analogy. > Using this analogy, I would write: > > select * from t WHERE x = y USING NOCASE How is this different from select * from t WHERE x = y COLLATE NOCASE I only see a superficial difference in the keyword used. What's the improvement? I *do* understand what you and Simon are suggesting. I *do* understand that you want some kind of syntax that would hang a collation onto the operator, rather than its operands, for certain philosophical and/or aesthetic reasons. My point is, this would have to be done *in addition to*, not *instead of*, the existing mechanism (because of the whole "comparison must automatically pick the collation off the column in order to use the index" thing). So it's strictly an added complexity. The burden is then on the authors of the proposal to demonstrate the benefits of the feature you are proposing, and to show that said benefits do in fact outweigh the cost of this added complexity. So I would ask you the same question I asked Simon: what exactly are the benefits of your approach? Is there something that can be done your way but can't be done the current way? Is there some bad outcome that's possible the current way but prevented your way? Do you just prefer your (as yet unspecified) syntax on purely aesthetic grounds? -- Igor Tandetnik _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users