import org.apache.spark.sql.functions._

df.groupBy("category")
  .agg(callUDF("collect_set", df("id")).as("id_list"))

On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com> wrote:

> Hey Spark users,
>
> I'm trying to group by a dataframe, by appending occurrences into a list
> instead of count.
>
> Let's say we have a dataframe as shown below:
>
> | category | id |
> | -------- |:--:|
> | A        | 1  |
> | A        | 2  |
> | B        | 3  |
> | B        | 4  |
> | C        | 5  |
>
> ideally, after some magic group by (reverse explode?):
>
> | category | id_list  |
> | -------- | -------- |
> | A        | 1,2      |
> | B        | 3,4      |
> | C        | 5        |
>
> any tricks to achieve that? Scala Spark API is preferred. =D
>
> BR,
> Todd Leo
>
>
>
>

Reply via email to