Re: Generating unique id for a column in Row without breaking into RDD and joining back

Mike Metzger Fri, 05 Aug 2016 08:45:31 -0700

You can use the monotonically_increasing_id method to generate guaranteed 
unique (but not necessarily consecutive) IDs.  Calling something like:


df.withColumn("id", monotonically_increasing_id())

You don't mention which language you're using but you'll need to pull in the 
sql.functions library.

Mike

> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote:
> 
> Ayan - basically i have a dataset with structure, where bid are unique string 
> values
> 
> bid: String
> val : integer
> 
> I need unique int values for these string bid''s to do some processing in the 
> dataset
> 
> like 
> 
> id:int   (unique integer id for each bid)
> bid:String
> val:integer
> 
> 
> 
> -Tony
> 
>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>> Hi
>> 
>> Can you explain a little further? 
>> 
>> best
>> Ayan
>> 
>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> wrote:
>>> I have a row with structure like
>>> 
>>> identifier: String
>>> value: int
>>> 
>>> All identifier are unique and I want to generate a unique long id for the 
>>> data and get a row object back for further processing. 
>>> 
>>> I understand using the zipWithUniqueId function on RDD, but that would mean 
>>> first converting to RDD and then joining back the RDD and dataset
>>> 
>>> What is the best way to do this ? 
>>> 
>>> -Tony 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Ayan Guha
>

Re: Generating unique id for a column in Row without breaking into RDD and joining back

Reply via email to