0 down vote favorite I have a spark data frame with following structure id flag price date a 0 100 2015 a 0 50 2015 a 1 200 2014 a 1 300 2013 a 0 400 2012 I need to create a data frame with recent value of flag 1 and updated in the flag 0 rows.
id flag price date new_column a 0 100 2015 200 a 0 50 2015 200 a 1 200 2014 null a 1 300 2013 null a 0 400 2012 null We have 2 rows having flag=0. Consider the first row(flag=0),I will have 2 values(200 and 300) and I am taking the recent one 200(2014). And the last row I don't have any recent value for flag 1 so it is updated with null. I found a solution with left join.My dataset having around 400M records and join cause lot of shuffling.Is there any better way to find recent value. Looking for a solution using scala. Any help would be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-recent-value-in-spark-dataframe-tp28230.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org