You need to use window functions to get this kind of behavior.  Or use max
and a struct (

On Thu, Dec 17, 2015 at 11:55 PM, Timothée Carayol <> wrote:

> Hi all,
> I tried to do something like the following in Spark
> df.orderBy('col1, 'col2).groupBy('col1).agg(first('col3))
> I was hoping to get, within each col1 value, the value for col3 that
> corresponds to the highest value for col2 within that col1 group. This only
> works if the order on col2 is preserved after the groupBy step.
> suggests that it is (unlike RDD.groupBy, DataFrame.groupBy is described as
> preserving the order).
> Yet in my experiments, I find that in some cases the order is not
> preserved. Running the same code multiple times gives me different results.
> If this is a bug, I'll happily work on a reproducible example and post to
> JIRA but I thought I'd check with the mailing list first in case that is,
> in fact, the expected behaviour?
> Thanks
> Timothée

Reply via email to