Hi Michael,

Can you be more specific on `collect_set`? Is it a built-in function or, if
it is an UDF, how it is defined?

BR,
Todd Leo

On Wed, Oct 14, 2015 at 2:12 AM Michael Armbrust <mich...@databricks.com>
wrote:

> import org.apache.spark.sql.functions._
>
> df.groupBy("category")
>   .agg(callUDF("collect_set", df("id")).as("id_list"))
>
> On Mon, Oct 12, 2015 at 11:08 PM, SLiZn Liu <sliznmail...@gmail.com>
> wrote:
>
>> Hey Spark users,
>>
>> I'm trying to group by a dataframe, by appending occurrences into a list
>> instead of count.
>>
>> Let's say we have a dataframe as shown below:
>>
>> | category | id |
>> | -------- |:--:|
>> | A        | 1  |
>> | A        | 2  |
>> | B        | 3  |
>> | B        | 4  |
>> | C        | 5  |
>>
>> ideally, after some magic group by (reverse explode?):
>>
>> | category | id_list  |
>> | -------- | -------- |
>> | A        | 1,2      |
>> | B        | 3,4      |
>> | C        | 5        |
>>
>> any tricks to achieve that? Scala Spark API is preferred. =D
>>
>> BR,
>> Todd Leo
>>
>>
>>
>>
>

Reply via email to