Hi All ,

  I want to do a dataframe operation to find the rows having the latest
timestamp in each group using the below operation

df.orderBy(desc("transaction_date")).groupBy("mobileno").agg(first("customername").as("customername"),first("service_type").as("service_type"),first("cust_addr").as("cust_abbr"))
.select("customername","service_type","mobileno","cust_addr")


*Spark Version :: 1.6.x*

My Question is *"Will Spark guarantee the Order while doing the
groupBy , if DF is ordered using OrderBy previously in Spark 1.6.x"??*


*I referred a blog here ::
**https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/
<https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/>*

*Which claims it will work except in Spark 1.5.1 and 1.5.2 .*


*I need a bit elaboration of how internally spark handles it ? also is
it more efficient than using a Window function ?*


*Thanks in Advance ,*

*Rabin Banerjee*

Reply via email to