That isn't what the original thread is about. The thread is about the timestamp portion of the UUID being different.
Having UUID() return the same thing for all rows in a batch would be the unexpected thing virtually every time. On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com> wrote: > > > On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote: > > This isn't about using the same UUID though. It's about the timestamp bits > in the UUID. > > What the use case is for generating multiple UUIDs in a single row? Why do > you need to extract the timestamp out of both? > On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com> > wrote: > > > On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <sylv...@datastax.com> > wrote: > > On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > > > I am not sure you saw my reply on thread but I believe everyone's needs > can be met I will copy that here: > > > I saw it, but the real problem that was raised initially was not that of > UDF and of allowing both behavior. It's a matter of people being confused > by the behavior of a non-UDF function, now(), and suggesting it should be > changed. > > The Hive idea is interesting I guess, and we can switch to discussing > that, but it's a different problem really and I'm not a fond of derailing > threads. I will just note though that if we're not talking about a > confusion issue but rather how to get a timeuuid to be fixed within a > statement, then there is much much more trivial solution: generate it > client side. The `now()` function is a small convenience but there is > nothing you cannot do without it client side, and that actually basically > stands for almost any use of (non aggregate) function in Cassandra > currently. > > > > > "Food for thought: Hive's UDFs introduced an annotation > @UDFType(deterministic > = false) > > > http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/ > > The effect is the query planner can see when such a UDF is in use and > determine the value once at the start of a very long query." > > Essentially hive had a similar if not identical problem, during a long > running distributed process like map/reduce some users wanted the semantics > of: > > 1) Each call should have a new timestamps > > While other users wanted the semantics of: > > 2) Each call should generate the same timestamp > > The solution implemented was to add an annotation to udf such that the > query planner would pick up the annotation and act accordingly. > > (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986 > > As a result you can essentially implement two UDFS > > @UDFType(deterministic = false) > public class UDFNow > > and for the other people > > @UDFType(deterministic = true) > public class UDFNowOnce extends UDFNow > > Both user cases are met in a sensible way. > > > > The `now()` function is a small convenience but there is nothing you > cannot do without it client side, and that actually basically stands for > almost any use of (non aggregate) function in Cassandra currently. > > Casandra's changing philosophy over which entity should create such > information client/server/driver does not make this problem easy. > > If you take into account that you have users who do not understand all the > intricacy of uuid the problem is compounded. IE How does one generate a > UUID each c#, python, java etc? with the 47 random bits of bla bla. That is > not super easy information to find. Maybe you find a stack overflow post > that actually gives bad advice etc. > > Many times in Cassandra you are using a uuid because you do not have a > unique key in the insert and you wish to create one. If you are inserting > more then a single record using that same UUID and you do not want the > burden of wanting to do it yourself you would have to do write>>read>>write > which is an anti-pattern. > > > Not multiple ids for a single row. The same id for multiple inserts in a > batch. > > For example lets say I have an application where my data has no unique > key. > > Table poke > Poker, pokee, time > > Suppose i consume pokes from kafka build a batch of 30k and insert them. > You probably want to denormalize into two tables: > Primary key (poker, time) > Primary key (pokee,time) > > It makes sense that they all have the same uuid if you want it to be the > uuid of the batch. This would make it easy to correlate all the events. > Easy to delete them all as well. > > The do it client side argument is totally valid, but has been a > justification for not adding features many of which are eventually added > anyway. > > > > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual. >