Should be pretty much the same code for Scala - import java.util.UUID UUID.randomUUID
If you need it as a UDF, just wrap it accordingly. Mike On Fri, Aug 5, 2016 at 11:38 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > On the same token can one generate a UUID like below in Hive > > hive> select reflect("java.util.UUID", "randomUUID"); > OK > 587b1665-b578-4124-8bf9-8b17ccb01fe7 > > thx > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 5 August 2016 at 17:34, Mike Metzger <m...@flexiblecreations.com> > wrote: > >> Tony - >> >> From my testing this is built with performance in mind. It's a 64-bit >> value split between the partition id (upper 31 bits ~1billion) and the id >> counter within a partition (lower 33 bits ~8 billion). There shouldn't be >> any added communication between the executors and the driver for that. >> >> I've been toying with an implementation that allows you to specify the >> split for better control along with a start value. >> >> Thanks >> >> Mike >> >> On Aug 5, 2016, at 11:07 AM, Tony Lane <tonylane....@gmail.com> wrote: >> >> Mike. >> >> I have figured how to do this . Thanks for the suggestion. It works >> great. I am trying to figure out the performance impact of this. >> >> thanks again >> >> >> On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane <tonylane....@gmail.com> wrote: >> >>> @mike - this looks great. How can i do this in java ? what is the >>> performance implication on a large dataset ? >>> >>> @sonal - I can't have a collision in the values. >>> >>> On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com >>> > wrote: >>> >>>> You can use the monotonically_increasing_id method to generate >>>> guaranteed unique (but not necessarily consecutive) IDs. Calling something >>>> like: >>>> >>>> df.withColumn("id", monotonically_increasing_id()) >>>> >>>> You don't mention which language you're using but you'll need to pull >>>> in the sql.functions library. >>>> >>>> Mike >>>> >>>> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote: >>>> >>>> Ayan - basically i have a dataset with structure, where bid are unique >>>> string values >>>> >>>> bid: String >>>> val : integer >>>> >>>> I need unique int values for these string bid''s to do some processing >>>> in the dataset >>>> >>>> like >>>> >>>> id:int (unique integer id for each bid) >>>> bid:String >>>> val:integer >>>> >>>> >>>> >>>> -Tony >>>> >>>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> Can you explain a little further? >>>>> >>>>> best >>>>> Ayan >>>>> >>>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> >>>>> wrote: >>>>> >>>>>> I have a row with structure like >>>>>> >>>>>> identifier: String >>>>>> value: int >>>>>> >>>>>> All identifier are unique and I want to generate a unique long id for >>>>>> the data and get a row object back for further processing. >>>>>> >>>>>> I understand using the zipWithUniqueId function on RDD, but that >>>>>> would mean first converting to RDD and then joining back the RDD and >>>>>> dataset >>>>>> >>>>>> What is the best way to do this ? >>>>>> >>>>>> -Tony >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>> >>>> >>> >> >