Great comment. +1 Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>:
> tl;dr +1 yup raise a jira to discuss how now() should behave in a single > statement (and possible extend to batch statements). > > The values of now should be the same if you assume that now() works like > it does in relational databases such as postgres or mysql, however at the > moment it instead works like sysdate() in mysql. Given that CQL is supposed > to be SQL like, I think the assumption around the behaviour of now() was a > fair one to make. > > I definitely agree that raising a jira ticket would be a great place to > discuss what the behaviour of now() should be for Cassandra. Personally I > would be in favour of seeing the deterministic component (the actual time > part) being the same across multiple calls in the one statement or multiple > statements in a batch. > > Cassandra documentation does not make any claims as to how now() works > within a single statement and reading the code it shows the intent is to > work like sysdate() from MySQL rather than now(). One of the identified > dangers of making cql similar to sql is that, while yes it aids adoption, > users will find that SQL like things don't behave as expected. Of course as > a user, one shouldn't have to read the source code to determine correct > behaviour. > > Given that a timeuuid is made up of deterministic and (pseudo) > non-deterministic components I can see why this issue has been largely > ignored and hasn't had a chance for the behaviour to be formally defined > (you would expect now to return the same time in the one statement despite > multiple calls, but you wouldn't expect the same behaviour for say a call > to rand()). > > > > > > > > On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote: > >> This is not a bug, and in fact changing it would be a serious bug. >> >> False. Absolutely no consumer would be broken by a change to guarantee an >> identical time component that isn't broken already, for the simple reason >> your code already has to handle that case, as it is in fact the majority >> case RIGHT NOW. Users can hit this bug, in production, because unit tests >> might not experienced it! The time component should be the time that the >> command was processed by the coordinator node. >> >> would one expect a java/py/bash script that loops >> >> Individual Cassandra writes (which is what OP is referring to >> specifically) are not loops. They are in almost every case atomic >> operations that either succeed completely or fail completely. Allowing a >> single atomic operation to witness multiple times in these corner cases is >> not only surprising, as this thread demonstrates, it is also needlessly >> restricting to what developers can use the database for, and provides NO >> BENEFIT. >> >> Calling now PRIOR to initiating multiple inserts is in most cases >> exactly what one does...the ONLY practice is to set the value before >> initiating the sequence of calls >> >> Also false. Cassandra does not have a way of doing this on the >> coordinator node rather than the client device, and as I already showed, >> the client device is the wrong place to do it in situations where >> guaranteeing bounded clock-skew actually makes a difference one way or the >> other. >> >> Thanks, >> Cody >> >> >> >> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com> >> wrote: >> >> This is not a bug, and in fact changing it would be a serious bug. >> >> What it is is a wonderful case of bad coding: would one expect a >> java/py/bash script that loops on a bunch of read/execut/update calls where >> each iteration calls time to return the same exact time for the duration of >> the execution of the code? Whether the code runs for 5 seconds or 5 hours? >> >> Every call to a system call is unique, including within C*. Calling now >> PRIOR to initiating multiple inserts is in most cases exactly what one does >> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly >> identical system time as would be the uuid of the row, one tries to call >> time as close to just before the insert as possible. Then repeat. >> >> You have a logic issue in your code. If you want the same value for a set >> of calls, the ONLY practice is to set the value before initiating the >> sequence of calls. >> >> >> >> *.......* >> >> >> >> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London >> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* >> >> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote: >> >> Getting the same TimeUUID values might be a major problem. Getting two >> different TimeUUIDs that at least have time component would not be a major >> problem as this is the main case today. Getting different time components >> is actually the corner case, and it is a corner case that breaks >> Internet-of-Things applications. We can tightly control clock skew in our >> cluster. We most definitely CANNOT control clock skew on the thousands of >> sensors that write to our cluster. >> >> Thanks, >> Cody >> >> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote: >> >> In my opinion, this is not broken and “fixing” it would break existing >> code. Consider a batch that includes multiple inserts, each of which >> inserts the value returned by now(). Getting the same UUID for each insert >> would be a major problem. >> >> Cheers >> >> Robert >> >> >> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> wrote: >> >> FWIW I'd suggest opening a bug--this behavior is certainly quite >> unexpected and more than just a documentation issue. In general I can't >> imagine any desirable properties of the current implementation, and there >> are likely a bunch of latent bugs sitting out there, so it should be fixed. >> >> Todd >> >> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote: >> >> Sorry for my typo. Obviously, I meant: >> "It appears that a single query that calls Cassandra's`now()` time >> function *multiple times *may actually cause a query to write or return >> different times." >> >> Less of a surprise now that I realize more about the implementation, but >> I agree that more explicit documentation around when exactly the >> "execution" of each now() statement happens and what implications it has >> for the resulting timestamps would be helpful when running into this. >> >> Thanks for the quick responses! >> >> -Terry >> >> >> >> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com> >> wrote: >> >> every now() call in statement is under the hood "replaced" with newly >> generated uuid. >> >> It can happen that they belong to different milliseconds in time. >> >> If you need to have same timestamps you need to set them on the client >> side. >> >> >> @msvaljek <https://twitter.com/msvaljek> >> >> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>: >> >> It appears that a single query that calls Cassandra's `now()` time >> function may actually cause a query to write or return different times. >> >> Is this the expected or defined behavior, and if so, why does it behave >> like this rather than evaluating `now()` once across an entire statement? >> >> This really affects UPDATE statements but to test it more easily, you >> could try something like: >> >> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b >> FROM keyspace.table >> LIMIT 100; >> >> If you run that a few times, you should eventually see that the timestamp >> returned moves onto the next millisecond mid-query. >> >> -- >> *Software Engineer* >> Turnitin - http://www.turnitin.com >> t...@turnitin.com >> >> >> >> >> >> -- >> *Software Engineer* >> Turnitin - http://www.turnitin.com >> t...@turnitin.com >> >> >> >> >> >> -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 <+1%20650-284-9692> > Managed Cassandra / Spark on AWS, Azure and Softlayer >