That isn't what the original thread is about. The thread is about the
timestamp portion of the UUID being different.

Having UUID() return the same thing for all rows in a batch would be the
unexpected thing virtually every time.
On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
> This isn't about using the same UUID though. It's about the timestamp bits
> in the UUID.
>
> What the use case is for generating multiple UUIDs in a single row? Why do
> you need to extract the timestamp out of both?
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>
> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com>
> wrote:
>
> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>
>
> "Food for thought: Hive's UDFs introduced an annotation  
> @UDFType(deterministic
> = false)
>
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>
>
>
> The `now()` function is a small convenience but there is nothing you
> cannot do without it client side, and that actually basically stands for
> almost any use of (non aggregate) function in Cassandra currently.
>
> Casandra's changing philosophy over which entity should create such
> information client/server/driver does not make this problem easy.
>
> If you take into account that you have users who do not understand all the
> intricacy of uuid the problem is compounded. IE How does one generate a
> UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
> not super easy information to find. Maybe you find a stack overflow post
> that actually gives bad advice etc.
>
> Many times in Cassandra you are using a uuid because you do not have a
> unique key in the insert and you wish to create one. If you are inserting
> more then a single record using that same UUID and you do not want the
> burden of wanting to do it yourself you would have to do write>>read>>write
> which is an anti-pattern.
>
>
> Not multiple ids for a single row. The same id for multiple inserts in a
> batch.
>
> For example lets say I have an application where my data has no unique
> key.
>
> Table poke
> Poker, pokee, time
>
> Suppose i consume pokes from kafka build a batch of 30k and insert them.
> You probably want to denormalize into two tables:
> Primary key (poker, time)
> Primary key (pokee,time)
>
> It makes sense that they all have the same uuid if you want it to be the
> uuid of the batch. This would make it easy to correlate all the events.
> Easy to delete them all as well.
>
> The do it client side argument is totally valid, but has been a
> justification for not adding features many of which are eventually added
> anyway.
>
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Reply via email to