Hey Amit, You might want to check out PairRDDFunctions <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions>. For your use case in particular, you can load the file as a RDD[(String, String)] and then use the groupByKey() function in PairRDDFunctions to get an RDD[(String, Iterable[String])].
Doris On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar <kumarami...@gmail.com> wrote: > Hi Folks, > > I am new to spark -and this is probably a basic question. > > I have a file on the hdfs > > 1, one > 1, uno > 2, two > 2, dos > > I want to create a multi Map RDD RDD[Map[String,List[String]]] > > {"1"->["one","uno"], "2"->["two","dos"]} > > > First I read the file > val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache() > > val identityDataList:RDD[List[String]]= > identityData.map{ line => > val splits= line.split(",") > splits.toList > } > > Then I group them by the first element > > val grouped:RDD[(String,Iterable[List[String]])]= > songArtistDataList.groupBy{ > element =>{ > element(0) > } > } > > Then I do the equivalent of mapValues of scala collections to get rid of > the first element > > val groupedWithValues:RDD[(String,List[String])] = > grouped.flatMap[(String,List[String])]{ case (key,list)=>{ > List((key,list.map{element => { > element(1) > }}.toList)) > } > } > > for this to actually materialize I do collect > > val groupedAndCollected=groupedWithValues.collect() > > I get an Array[String,List[String]]. > > I am trying to figure out if there is a way for me to get > Map[String,List[String]] (a multimap), or to create an > RDD[Map[String,List[String]] ] > > > I am sure there is something simpler, I would appreciate advice. > > Many thanks, > Amit > > > > > > > > > >