thanks, I tried with left outer join. My dataset having around 400M records
and lot of shuffling is happening.Is there any other workaround apart from
Join,I tried use window function but I am not getting a proper solution,


Thanks

On Sat, Dec 17, 2016 at 4:55 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> Oh and to get the null for missing years, you'd need to do an outer join
> with a table containing all of the years you are interested in.
>
> On Fri, Dec 16, 2016 at 3:24 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Are you looking for argmax? Here is an example
>> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/3170497669323442/2840265927289860/latest.html>
>> .
>>
>> On Wed, Dec 14, 2016 at 8:49 PM, Milin korath <milin.kor...@impelsys.com>
>> wrote:
>>
>>> Hi
>>>
>>> I have a spark data frame with following structure
>>>
>>>  id  flag price date
>>>   a   0    100  2015
>>>   a   0    50   2015
>>>   a   1    200  2014
>>>   a   1    300  2013
>>>   a   0    400  2012
>>>
>>> I need to create a data frame with recent value of flag 1 and updated in
>>> the flag 0 rows.
>>>
>>>       id  flag price date new_column
>>>       a   0    100  2015    200
>>>       a   0    50   2015    200
>>>       a   1    200  2014    null
>>>       a   1    300  2013    null
>>>       a   0    400  2012    null
>>>
>>> We have 2 rows having flag=0. Consider the first row(flag=0),I will have
>>> 2 values(200 and 300) and I am taking the recent one 200(2014). And the
>>> last row I don't have any recent value for flag 1 so it is updated with
>>> null.
>>>
>>> Looking for a solution using scala. Any help would be appreciated.Thanks
>>>
>>> Thanks
>>> Milin
>>>
>>
>>
>

Reply via email to