Hey Amit,

You might want to check out PairRDDFunctions
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions>.
For your use case in particular, you can load the file as a RDD[(String,
String)] and then use the groupByKey() function in PairRDDFunctions to get
an RDD[(String, Iterable[String])].

Doris


On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar <kumarami...@gmail.com> wrote:

> Hi Folks,
>
> I am new to spark -and this is probably a basic question.
>
> I have a file on the hdfs
>
> 1, one
> 1, uno
> 2, two
> 2, dos
>
> I want to create a multi Map RDD  RDD[Map[String,List[String]]]
>
> {"1"->["one","uno"], "2"->["two","dos"]}
>
>
> First I read the file
> val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache()
>
> val identityDataList:RDD[List[String]]=
>       identityData.map{ line =>
>         val splits= line.split(",")
>         splits.toList
>     }
>
> Then I group them by the first element
>
>  val grouped:RDD[(String,Iterable[List[String]])]=
>     songArtistDataList.groupBy{
>       element =>{
>         element(0)
>       }
>     }
>
> Then I do the equivalent of mapValues of scala collections to get rid of
> the first element
>
>  val groupedWithValues:RDD[(String,List[String])] =
>     grouped.flatMap[(String,List[String])]{ case (key,list)=>{
>       List((key,list.map{element => {
>         element(1)
>       }}.toList))
>     }
>     }
>
> for this to actually materialize I do collect
>
>  val groupedAndCollected=groupedWithValues.collect()
>
> I get an Array[String,List[String]].
>
> I am trying to figure out if there is a way for me to get
> Map[String,List[String]] (a multimap), or to create an
> RDD[Map[String,List[String]] ]
>
>
> I am sure there is something simpler, I would appreciate advice.
>
> Many thanks,
> Amit
>
>
>
>
>
>
>
>
>
>

Reply via email to