@mike - this looks great. How can i do this in java ? what is the performance implication on a large dataset ?
@sonal - I can't have a collision in the values. On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com> wrote: > You can use the monotonically_increasing_id method to generate guaranteed > unique (but not necessarily consecutive) IDs. Calling something like: > > df.withColumn("id", monotonically_increasing_id()) > > You don't mention which language you're using but you'll need to pull in > the sql.functions library. > > Mike > > On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote: > > Ayan - basically i have a dataset with structure, where bid are unique > string values > > bid: String > val : integer > > I need unique int values for these string bid''s to do some processing in > the dataset > > like > > id:int (unique integer id for each bid) > bid:String > val:integer > > > > -Tony > > On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Hi >> >> Can you explain a little further? >> >> best >> Ayan >> >> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> >> wrote: >> >>> I have a row with structure like >>> >>> identifier: String >>> value: int >>> >>> All identifier are unique and I want to generate a unique long id for >>> the data and get a row object back for further processing. >>> >>> I understand using the zipWithUniqueId function on RDD, but that would >>> mean first converting to RDD and then joining back the RDD and dataset >>> >>> What is the best way to do this ? >>> >>> -Tony >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >