import org.apache.spark.sql.functions._ df.groupBy("category") .agg(callUDF("collect_set", df("id")).as("id_list"))
On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com> wrote: > Hey Spark users, > > I'm trying to group by a dataframe, by appending occurrences into a list > instead of count. > > Let's say we have a dataframe as shown below: > > | category | id | > | -------- |:--:| > | A | 1 | > | A | 2 | > | B | 3 | > | B | 4 | > | C | 5 | > > ideally, after some magic group by (reverse explode?): > > | category | id_list | > | -------- | -------- | > | A | 1,2 | > | B | 3,4 | > | C | 5 | > > any tricks to achieve that? Scala Spark API is preferred. =D > > BR, > Todd Leo > > > >