> > > > I will note that Ben seems to suggest keeping the return of now() unique > across > call while keeping the time component equals, thus varying the rest of the > uuid > bytes. However: > - I'm starting to wonder what this would buy us. Why would someone be > super > confused by the time changing across calls (in a single > statement/batch), but > be totally not confused by the actual full return to not be equal? > Given that a common way of interacting with timeuuids is with toTimestamp I can see the confusion and assumptions around behaviour.
And how is > that actually useful: you're having different result anyway and you're > letting the server pick the timestamp in the first place, so you're > probably > not caring about milliseconds precision of that timestamp in the first > place. > If you want consistency of timestamps within your query as OP did I can see how this is useful. Postgres claims this is a "feature". - This would basically be a violation of the timeuuid spec > Not quite... Type 1 uuids let you swap out the low 47 bits of the node component with other randomly generated bits ( https://www.ietf.org/rfc/rfc4122.txt) - This would be a big pain in the code and make of now() a special case > among functions. I'm unconvinced special cases are making things easier > in general. > On reflection, I have to agree here, now() has been around for ever and this is the first anecdote I've seen of someone getting caught out. However with my user advocate hat on I think it would be worth investigating further beyond a documentation update if others found it a sticking point in Cassandra adoption. > So I'm all for improving the documentation if this confuses users due to > expectations (mistakenly) carried from prior experiences, and please > feel free to open a JIRA for that. I'm a lot less in agreement that there > is > something wrong with the way the function behave in principle. > > > I can see why this issue has been largely ignored and hasn't had a > chance for > > the behaviour to be formally defined > > Don't make too much assumptions. The behavior is perfectly well defined: > now() > is a "normal" function and is evaluated whenever it's called according to > the > timeuuid spec (or as close to it as we can make it). > Maybe formally defined is the wrong term... Formally documented? > > On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > > Great comment. +1 > > Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>: > > tl;dr +1 yup raise a jira to discuss how now() should behave in a single > statement (and possible extend to batch statements). > > The values of now should be the same if you assume that now() works like > it does in relational databases such as postgres or mysql, however at the > moment it instead works like sysdate() in mysql. Given that CQL is supposed > to be SQL like, I think the assumption around the behaviour of now() was a > fair one to make. > > I definitely agree that raising a jira ticket would be a great place to > discuss what the behaviour of now() should be for Cassandra. Personally I > would be in favour of seeing the deterministic component (the actual time > part) being the same across multiple calls in the one statement or multiple > statements in a batch. > > Cassandra documentation does not make any claims as to how now() works > within a single statement and reading the code it shows the intent is to > work like sysdate() from MySQL rather than now(). One of the identified > dangers of making cql similar to sql is that, while yes it aids adoption, > users will find that SQL like things don't behave as expected. Of course as > a user, one shouldn't have to read the source code to determine correct > behaviour. > > Given that a timeuuid is made up of deterministic and (pseudo) > non-deterministic components I can see why this issue has been largely > ignored and hasn't had a chance for the behaviour to be formally defined > (you would expect now to return the same time in the one statement despite > multiple calls, but you wouldn't expect the same behaviour for say a call > to rand()). > > > > > > > > On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote: > > This is not a bug, and in fact changing it would be a serious bug. > > False. Absolutely no consumer would be broken by a change to guarantee an > identical time component that isn't broken already, for the simple reason > your code already has to handle that case, as it is in fact the majority > case RIGHT NOW. Users can hit this bug, in production, because unit tests > might not experienced it! The time component should be the time that the > command was processed by the coordinator node. > > would one expect a java/py/bash script that loops > > Individual Cassandra writes (which is what OP is referring to > specifically) are not loops. They are in almost every case atomic > operations that either succeed completely or fail completely. Allowing a > single atomic operation to witness multiple times in these corner cases is > not only surprising, as this thread demonstrates, it is also needlessly > restricting to what developers can use the database for, and provides NO > BENEFIT. > > Calling now PRIOR to initiating multiple inserts is in most cases > exactly what one does...the ONLY practice is to set the value before > initiating the sequence of calls > > Also false. Cassandra does not have a way of doing this on the coordinator > node rather than the client device, and as I already showed, the client > device is the wrong place to do it in situations where guaranteeing bounded > clock-skew actually makes a difference one way or the other. > > Thanks, > Cody > > > > On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com> > wrote: > > This is not a bug, and in fact changing it would be a serious bug. > > What it is is a wonderful case of bad coding: would one expect a > java/py/bash script that loops on a bunch of read/execut/update calls where > each iteration calls time to return the same exact time for the duration of > the execution of the code? Whether the code runs for 5 seconds or 5 hours? > > Every call to a system call is unique, including within C*. Calling now > PRIOR to initiating multiple inserts is in most cases exactly what one does > to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly > identical system time as would be the uuid of the row, one tries to call > time as close to just before the insert as possible. Then repeat. > > You have a logic issue in your code. If you want the same value for a set > of calls, the ONLY practice is to set the value before initiating the > sequence of calls. > > > > *.......* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London > (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* > > On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote: > > Getting the same TimeUUID values might be a major problem. Getting two > different TimeUUIDs that at least have time component would not be a major > problem as this is the main case today. Getting different time components > is actually the corner case, and it is a corner case that breaks > Internet-of-Things applications. We can tightly control clock skew in our > cluster. We most definitely CANNOT control clock skew on the thousands of > sensors that write to our cluster. > > Thanks, > Cody > > On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote: > > In my opinion, this is not broken and “fixing” it would break existing > code. Consider a batch that includes multiple inserts, each of which > inserts the value returned by now(). Getting the same UUID for each insert > would be a major problem. > > Cheers > > Robert > > > On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> wrote: > > FWIW I'd suggest opening a bug--this behavior is certainly quite > unexpected and more than just a documentation issue. In general I can't > imagine any desirable properties of the current implementation, and there > are likely a bunch of latent bugs sitting out there, so it should be fixed. > > Todd > > On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote: > > Sorry for my typo. Obviously, I meant: > "It appears that a single query that calls Cassandra's`now()` time > function *multiple times *may actually cause a query to write or return > different times." > > Less of a surprise now that I realize more about the implementation, but I > agree that more explicit documentation around when exactly the "execution" > of each now() statement happens and what implications it has for the > resulting timestamps would be helpful when running into this. > > Thanks for the quick responses! > > -Terry > > > > On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com> wrote: > > every now() call in statement is under the hood "replaced" with newly > generated uuid. > > It can happen that they belong to different milliseconds in time. > > If you need to have same timestamps you need to set them on the client > side. > > > @msvaljek <https://twitter.com/msvaljek> > > 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>: > > It appears that a single query that calls Cassandra's `now()` time > function may actually cause a query to write or return different times. > > Is this the expected or defined behavior, and if so, why does it behave > like this rather than evaluating `now()` once across an entire statement? > > This really affects UPDATE statements but to test it more easily, you > could try something like: > > SELECT toTimestamp(now()) as a, toTimestamp(now()) as b > FROM keyspace.table > LIMIT 100; > > If you run that a few times, you should eventually see that the timestamp > returned moves onto the next millisecond mid-query. > > -- > *Software Engineer* > Turnitin - http://www.turnitin.com > t...@turnitin.com > > > > > > -- > *Software Engineer* > Turnitin - http://www.turnitin.com > t...@turnitin.com > > > > > > -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 <+1%20650-284-9692> > Managed Cassandra / Spark on AWS, Azure and Softlayer > > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer