reduceByKey would be a perfect fit for you

On Wed, Dec 9, 2015 at 4:47 AM, Krishna <research...@gmail.com> wrote:

> Hi,
>
> what is the most efficient way to perform a group-by operation in Spark
> and merge rows into csv?
>
> Here is the current RDD
> -----------------
> ID   STATE
> -----------------
> 1       TX
> 1        NY
> 1        FL
> 2        CA
> 2        OH
> -----------------
>
> This is the required output:
> -------------------------
> ID    CSV_STATE
> -------------------------
> 1     TX,NY,FL
> 2     CA,OH
> -------------------------
>



-- 
Best Regards,
Ayan Guha

Reply via email to