Its actually a bit tougher as you’ll first need all the years. Also not sure how you would reprsent your “columns” given they are dynamic based on the input data.
Depending on your downstream processing, I’d probably try to emulate it with a hash map with years as keys instead of the columns. There is probably a nicer solution using the data frames API but I’m not familiar with it. If you actually need vectors I think this article I saw recently on the data bricks blog will highlight some options (look for gather encoder) https://databricks.com/blog/2015/10/20/audience-modeling-with-spark-ml-pipelines.html -adrian From: Deng Ching-Mallete Date: Friday, October 30, 2015 at 4:35 AM To: Ascot Moss Cc: User Subject: Re: Pivot Data in Spark and Scala Hi, You could transform it into a pair RDD then use the combineByKey function. HTH, Deng On Thu, Oct 29, 2015 at 7:29 PM, Ascot Moss <ascot.m...@gmail.com<mailto:ascot.m...@gmail.com>> wrote: Hi, I have data as follows: A, 2015, 4 A, 2014, 12 A, 2013, 1 B, 2015, 24 B, 2013 4 I need to convert the data to a new format: A , 4, 12, 1 B, 24, , 4 Any idea how to make it in Spark Scala? Thanks