Re: Why does `now()` produce different times within the same query?

Ben Bromhead Thu, 01 Dec 2016 14:25:07 -0800

>
>
>
> I will note that Ben seems to suggest keeping the return of now() unique
> across
> call while keeping the time component equals, thus varying the rest of the
> uuid
> bytes. However:
>  - I'm starting to wonder what this would buy us. Why would someone be
> super
>    confused by the time changing across calls (in a single
> statement/batch), but
>    be totally not confused by the actual full return to not be equal?
>
Given that a common way of interacting with timeuuids is with toTimestamp I
can see the confusion and assumptions around behaviour.


And how is
>    that actually useful: you're having different result anyway and you're
>    letting the server pick the timestamp in the first place, so you're
> probably
>    not caring about milliseconds precision of that timestamp in the first
> place.
>
If you want consistency of timestamps within your query as OP did I can see
how this is useful. Postgres claims this is a "feature".

 - This would basically be a violation of the timeuuid spec
>

Not quite... Type 1 uuids let you swap out the low 47 bits of the node
component with other randomly generated bits (
https://www.ietf.org/rfc/rfc4122.txt)

 - This would be a big pain in the code and make of now() a special case
>     among functions. I'm unconvinced special cases are making things easier
>     in general.
>

On reflection, I have to agree here, now() has been around for ever and
this is the first anecdote I've seen of someone getting caught out.

However with my user advocate hat on I think it would be worth
investigating further beyond a documentation update if others found it a
sticking point in Cassandra adoption.


> So I'm all for improving the documentation if this confuses users due to
> expectations (mistakenly) carried from prior experiences, and please
> feel free to open a JIRA for that. I'm a lot less in agreement that there
> is
> something wrong with the way the function behave in principle.
>


> > I can see why this issue has been largely ignored and hasn't had a
> chance for
> > the behaviour to be formally defined
>
> Don't make too much assumptions. The behavior is perfectly well defined:
> now()
> is a "normal" function and is evaluated whenever it's called according to
> the
> timeuuid spec (or as close to it as we can make it).
>
Maybe formally defined is the wrong term... Formally documented?

>
> On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> Great comment. +1
>
> Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>:
>
> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
> statement (and possible extend to batch statements).
>
> The values of now should be the same if you assume that now() works like
> it does in relational databases such as postgres or mysql, however at the
> moment it instead works like sysdate() in mysql. Given that CQL is supposed
> to be SQL like, I think the assumption around the behaviour of now() was a
> fair one to make.
>
> I definitely agree that raising a jira ticket would be a great place to
> discuss what the behaviour of now() should be for Cassandra. Personally I
> would be in favour of seeing the deterministic component (the actual time
> part) being the same across multiple calls in the one statement or multiple
> statements in a batch.
>
> Cassandra documentation does not make any claims as to how now() works
> within a single statement and reading the code it shows the intent is to
> work like sysdate() from MySQL rather than now(). One of the identified
> dangers of making cql similar to sql is that, while yes it aids adoption,
> users will find that SQL like things don't behave as expected. Of course as
> a user, one shouldn't have to read the source code to determine correct
> behaviour.
>
> Given that a timeuuid is made up of deterministic and (pseudo)
> non-deterministic components I can see why this issue has been largely
> ignored and hasn't had a chance for the behaviour to be formally defined
> (you would expect now to return the same time in the one statement despite
> multiple calls, but you wouldn't expect the same behaviour for say a call
> to rand()).
>
>
>
>
>
>
>
> On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote:
>
>     This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>      would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
>     Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
> This is not a bug, and in fact changing it would be a serious bug.
>
> What it is is a wonderful case of bad coding: would one expect a
> java/py/bash script that loops on a bunch of read/execut/update calls where
> each iteration calls time to return the same exact time for the duration of
> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>
> Every call to a system call is unique, including within C*. Calling now
> PRIOR to initiating multiple inserts is in most cases exactly what one does
> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
> identical system time as would be the uuid of the row, one tries to call
> time as close to just before the insert as possible. Then repeat.
>
> You have a logic issue in your code. If you want the same value for a set
> of calls, the ONLY practice is to set the value before initiating the
> sequence of calls.
>
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote:
>
> Getting the same TimeUUID values might be a major problem. Getting two
> different TimeUUIDs that at least have time component would not be a major
> problem as this is the main case today. Getting different time components
> is actually the corner case, and it is a corner case that breaks
> Internet-of-Things applications. We can tightly control clock skew in our
> cluster. We most definitely CANNOT control clock skew on the thousands of
> sensors that write to our cluster.
>
> Thanks,
> Cody
>
> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote:
>
> In my opinion, this is not broken and “fixing” it would break existing
> code. Consider a batch that includes multiple inserts, each of which
> inserts the value returned by now(). Getting the same UUID for each insert
> would be a major problem.
>
> Cheers
>
> Robert
>
>
> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> wrote:
>
> FWIW I'd suggest opening a bug--this behavior is certainly quite
> unexpected and more than just a documentation issue. In general I can't
> imagine any desirable properties of the current implementation, and there
> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>
> Todd
>
> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote:
>
> Sorry for my typo. Obviously, I meant:
> "It appears that a single query that calls Cassandra's`now()` time
> function *multiple times *may actually cause a query to write or return
> different times."
>
> Less of a surprise now that I realize more about the implementation, but I
> agree that more explicit documentation around when exactly the "execution"
> of each now() statement happens and what implications it has for the
> resulting timestamps would be helpful when running into this.
>
> Thanks for the quick responses!
>
> -Terry
>
>
>
> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com> wrote:
>
> every now() call in statement is under the hood "replaced" with newly
> generated uuid.
>
> It can happen that they belong to  different milliseconds in time.
>
> If you need to have same timestamps you need to set them on the client
> side.
>
>
> @msvaljek <https://twitter.com/msvaljek>
>
> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>:
>
> It appears that a single query that calls Cassandra's `now()` time
> function may actually cause a query to write or return different times.
>
> Is this the expected or defined behavior, and if so, why does it behave
> like this rather than evaluating `now()` once across an entire statement?
>
> This really affects UPDATE statements but to test it more easily, you
> could try something like:
>
> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
> FROM keyspace.table
> LIMIT 100;
>
> If you run that a few times, you should eventually see that the timestamp
> returned moves onto the next millisecond mid-query.
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>
>
>
>
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>
>
>
>
>
> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692 <+1%20650-284-9692>
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: Why does `now()` produce different times within the same query?

Reply via email to