Mike,

Any suggestions on doing it for consequitive id's?
On Aug 5, 2016 9:08 AM, "Tony Lane" <tonylane....@gmail.com> wrote:

> Mike.
>
> I have figured how to do this .  Thanks for the suggestion. It works
> great.  I am trying to figure out the performance impact of this.
>
> thanks again
>
>
> On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane <tonylane....@gmail.com> wrote:
>
>> @mike  - this looks great. How can i do this in java ?   what is the
>> performance implication on a large dataset  ?
>>
>> @sonal  - I can't have a collision in the values.
>>
>> On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com>
>> wrote:
>>
>>> You can use the monotonically_increasing_id method to generate
>>> guaranteed unique (but not necessarily consecutive) IDs.  Calling something
>>> like:
>>>
>>> df.withColumn("id", monotonically_increasing_id())
>>>
>>> You don't mention which language you're using but you'll need to pull in
>>> the sql.functions library.
>>>
>>> Mike
>>>
>>> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote:
>>>
>>> Ayan - basically i have a dataset with structure, where bid are unique
>>> string values
>>>
>>> bid: String
>>> val : integer
>>>
>>> I need unique int values for these string bid''s to do some processing
>>> in the dataset
>>>
>>> like
>>>
>>> id:int   (unique integer id for each bid)
>>> bid:String
>>> val:integer
>>>
>>>
>>>
>>> -Tony
>>>
>>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Can you explain a little further?
>>>>
>>>> best
>>>> Ayan
>>>>
>>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a row with structure like
>>>>>
>>>>> identifier: String
>>>>> value: int
>>>>>
>>>>> All identifier are unique and I want to generate a unique long id for
>>>>> the data and get a row object back for further processing.
>>>>>
>>>>> I understand using the zipWithUniqueId function on RDD, but that would
>>>>> mean first converting to RDD and then joining back the RDD and dataset
>>>>>
>>>>> What is the best way to do this ?
>>>>>
>>>>> -Tony
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>
>>>
>>
>

Reply via email to