You can use the monotonically_increasing_id method to generate guaranteed unique (but not necessarily consecutive) IDs. Calling something like:
df.withColumn("id", monotonically_increasing_id()) You don't mention which language you're using but you'll need to pull in the sql.functions library. Mike > On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote: > > Ayan - basically i have a dataset with structure, where bid are unique string > values > > bid: String > val : integer > > I need unique int values for these string bid''s to do some processing in the > dataset > > like > > id:int (unique integer id for each bid) > bid:String > val:integer > > > > -Tony > >> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote: >> Hi >> >> Can you explain a little further? >> >> best >> Ayan >> >>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> wrote: >>> I have a row with structure like >>> >>> identifier: String >>> value: int >>> >>> All identifier are unique and I want to generate a unique long id for the >>> data and get a row object back for further processing. >>> >>> I understand using the zipWithUniqueId function on RDD, but that would mean >>> first converting to RDD and then joining back the RDD and dataset >>> >>> What is the best way to do this ? >>> >>> -Tony >> >> >> >> -- >> Best Regards, >> Ayan Guha >