reduceByKey would be a perfect fit for you On Wed, Dec 9, 2015 at 4:47 AM, Krishna <research...@gmail.com> wrote:
> Hi, > > what is the most efficient way to perform a group-by operation in Spark > and merge rows into csv? > > Here is the current RDD > ----------------- > ID STATE > ----------------- > 1 TX > 1 NY > 1 FL > 2 CA > 2 OH > ----------------- > > This is the required output: > ------------------------- > ID CSV_STATE > ------------------------- > 1 TX,NY,FL > 2 CA,OH > ------------------------- > -- Best Regards, Ayan Guha