@mike  - this looks great. How can i do this in java ?   what is the
performance implication on a large dataset  ?

@sonal  - I can't have a collision in the values.

On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com>
wrote:

> You can use the monotonically_increasing_id method to generate guaranteed
> unique (but not necessarily consecutive) IDs.  Calling something like:
>
> df.withColumn("id", monotonically_increasing_id())
>
> You don't mention which language you're using but you'll need to pull in
> the sql.functions library.
>
> Mike
>
> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote:
>
> Ayan - basically i have a dataset with structure, where bid are unique
> string values
>
> bid: String
> val : integer
>
> I need unique int values for these string bid''s to do some processing in
> the dataset
>
> like
>
> id:int   (unique integer id for each bid)
> bid:String
> val:integer
>
>
>
> -Tony
>
> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> Can you explain a little further?
>>
>> best
>> Ayan
>>
>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com>
>> wrote:
>>
>>> I have a row with structure like
>>>
>>> identifier: String
>>> value: int
>>>
>>> All identifier are unique and I want to generate a unique long id for
>>> the data and get a row object back for further processing.
>>>
>>> I understand using the zipWithUniqueId function on RDD, but that would
>>> mean first converting to RDD and then joining back the RDD and dataset
>>>
>>> What is the best way to do this ?
>>>
>>> -Tony
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Reply via email to