Sylvain Lebresne created CASSANDRA-5226:
-------------------------------------------

             Summary: CQL3 refactor to allow conversion function
                 Key: CASSANDRA-5226
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5226
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne


In CASSANDRA-5198, we've fixed CQL3 type validation and talked about adding 
conversion functions to ease working with the different data types. However, 
the current CQL3 code makes it fairly hard to add such functions in a non-hacky 
way. In fact, we already support a few conversion functions (token, 
minTimeuuid, maxTimeuuid, now) but the way we support them is extremely ugly 
(the token function is competely special cased and min/maxTimeuuid are ugly 
hacks in TimeUUIDType that I'm really not proud of).

So I'm attaching a refactor that cleans that up, making it easy to add new 
conversion functions. Now, said refactor is a big one. While the goal is to 
make it easy to add functions, I think it also improve the code in the 
following ways:
* It much more clearly separate the phase of validating the query from 
executing it. And in particular, it moves more work in the preparing phase.  
Typically, the parsing of constants is now done in the preparing phase, not the 
execution one. It also groups validation code much more cleanly imo.
* It simplify UpdateStatement. The Operation business was not very clean and in 
particular the same operations where not handled by the same code depending on 
whether they were prepared or not, which was error prone. This is no longer the 
case.
* It somewhat simplify the parser. A few parsing rules were a bit too 
convoluted, trying to enforce invariants that are much more easily checked post 
parsing (and doing it post parsing often allow better error messages, the 
parser tends to send cryptic errors).

The first attached part is the initial refactor. It also adds some relatively 
generic code for adding conversion functions (it would typically not be very 
hard to allow user defined functions, though that's not part of the patch at 
all) and uses that to handle the existing token, minTimeuuid and maxTimeuuid 
functions.

It's also worth mentioning that this first patch introduces type casts. The 
main reason is that it allows multiple overloads of the same function. 
Typically, the minTimeuuid allows both strings arguments (for dates) or integer 
ones (for timestamp), but so when you have:
{noformat}
SELECT * FROM foo WHERE t > minTimeuuid(?);
{noformat}
then the code doesn't know which function to use. So it complains. But you can 
remove the ambiguity with
{noformat}
SELECT * FROM foo WHERE t > minTimeuuid((bigint)?);
{noformat}

The 2nd patch finishes what the first one started by extending this conversion 
functions support to select clauses. So after this 2nd patch you can do stuff 
like:
{noformat}
SELECT token(k), k FROM foo;
{noformat}
for instance.

The 3rd patch builds on that to actually add new conversion functions. Namely, 
for every existing CQL3 <type> it adds a blobTo<type> and a <type>ToBlob 
function that convert from and to blobs. And so you can do (not that this 
example is particularly smart):
{noformat}
SELECT varintToBlob(v) FROM foo WHERE v > blobToVarint(bigintToBlob(3));
{noformat}
Honestly this last patch is more for demonstration purpose and we can discuss 
this separately. In particular, we may want better different names for those 
functions. But at least it should highlight that adding new function is easy 
(this could be used to add methods to work with dates for instance).

Now, at least considering the 2 first patches, this is not a small amount of 
code but I would still suggest pushing this in 1.2 (the patches are against 
1.2) for the following reasons:
 * It fixes a few existing bugs (CASSANDRA-5198 has broke prepared statement 
for instance, which this patch fixes) and add missing validation in a few 
places (we are allowing sets literals like \{ 1, 1 \} for instance, which is 
kind of wrong as it suggests we support multisets). We could fix those 
separatly but honestly I'm not we won't miss some.
 * We do have a fair amount of CQL dtests and i've check all pass. The refactor 
also cleans up some part of the code quite a bit imo. So overall I think I'm 
almost more confident in the code post-refactor than the current one.
 * We're early in 1.2 and it's an improvement after all. It would be a bit sad 
to have to wait for 2.0 to get this.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to