I would go for partition by option. It seems simple and yes, SQL inspired :) On 4 Nov 2016 00:59, "Rabin Banerjee" <dev.rabin.baner...@gmail.com> wrote:
> Hi Koert & Robin , > > * Thanks ! *But if you go through the blog https://bzhangusc. > wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ and > check the comments under the blog it's actually working, although I am not > sure how . And yes I agree a custom aggregate UDAF is a good option . > > Can anyone share the best way to implement this in Spark .? > > Regards, > Rabin Banerjee > > On Thu, Nov 3, 2016 at 6:59 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> Just realized you only want to keep first element. You can do this >> without sorting by doing something similar to min or max operation using a >> custom aggregator/udaf or reduceGroups on Dataset. This is also more >> efficient. >> >> On Nov 3, 2016 7:53 AM, "Rabin Banerjee" <dev.rabin.baner...@gmail.com> >> wrote: >> >>> Hi All , >>> >>> I want to do a dataframe operation to find the rows having the latest >>> timestamp in each group using the below operation >>> >>> df.orderBy(desc("transaction_date")).groupBy("mobileno").agg(first("customername").as("customername"),first("service_type").as("service_type"),first("cust_addr").as("cust_abbr")) >>> .select("customername","service_type","mobileno","cust_addr") >>> >>> >>> *Spark Version :: 1.6.x* >>> >>> My Question is *"Will Spark guarantee the Order while doing the groupBy , >>> if DF is ordered using OrderBy previously in Spark 1.6.x"??* >>> >>> >>> *I referred a blog here :: >>> **https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ >>> >>> <https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/>* >>> >>> *Which claims it will work except in Spark 1.5.1 and 1.5.2 .* >>> >>> >>> *I need a bit elaboration of how internally spark handles it ? also is it >>> more efficient than using a Window function ?* >>> >>> >>> *Thanks in Advance ,* >>> >>> *Rabin Banerjee* >>> >>> >>> >>> >