Re: How to get recent value in spark dataframe

2016-12-20 Thread Divya Gehlot
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html Hope this helps Thanks, Divya On 15 December 2016 at 12:49, Milin korath wrote: > Hi > > I have a spark data frame with following structure > > id flag price date > a 0100 2015 > a 050

Re: How to get recent value in spark dataframe

2016-12-19 Thread ayan guha
You have 2 parts to it 1. Do a sub query where for each primary key derive latest value of flag=1 records. Ensure you get exactly 1 record per primary key value. Here you can use rank() over (partition by primary key order by year desc) 2. Join your original dataset with the above on primary key.

Re: How to get recent value in spark dataframe

2016-12-18 Thread Richard Xin
I am not sure I understood your logic, but it seems to me that you could take a look of Hive's Lead/Lag functions. On Monday, December 19, 2016 1:41 AM, Milin korath wrote: thanks, I tried with left outer join. My dataset having around 400M records and lot of shuffling is happening.Is

Re: How to get recent value in spark dataframe

2016-12-18 Thread Milin korath
thanks, I tried with left outer join. My dataset having around 400M records and lot of shuffling is happening.Is there any other workaround apart from Join,I tried use window function but I am not getting a proper solution, Thanks On Sat, Dec 17, 2016 at 4:55 AM, Michael Armbrust wrote: > Oh a

How to get recent value in spark dataframe

2016-12-18 Thread milinkorath
using scala. Any help would be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-recent-value-in-spark-dataframe-tp28230.html Sent from the Apache Spark User List mailing list archive at Nabbl

Re: How to get recent value in spark dataframe

2016-12-16 Thread Michael Armbrust
Oh and to get the null for missing years, you'd need to do an outer join with a table containing all of the years you are interested in. On Fri, Dec 16, 2016 at 3:24 PM, Michael Armbrust wrote: > Are you looking for argmax? Here is an example >

Re: How to get recent value in spark dataframe

2016-12-16 Thread Michael Armbrust
Are you looking for argmax? Here is an example . On Wed, Dec 14, 2016 at 8:49 PM, Milin korath wrote: > Hi > > I have a spark data fram

Re: How to get recent value in spark dataframe

2016-12-16 Thread vaquar khan
Not sure about your logic 0 and 1 but you can use orderBy the data according to time and get the first value. Regards, Vaquar khan On Wed, Dec 14, 2016 at 10:49 PM, Milin korath wrote: > Hi > > I have a spark data frame with following structure > > id flag price date > a 0100 2015 >

How to get recent value in spark dataframe

2016-12-14 Thread Milin korath
Hi I have a spark data frame with following structure id flag price date a 0100 2015 a 050 2015 a 1200 2014 a 1300 2013 a 0400 2012 I need to create a data frame with recent value of flag 1 and updated in the flag 0 rows. id flag price date