Just a thought... Are you trying to use use the RDD as a Map?
On 3 June 2014 23:14, Doris Xin <doris.s....@gmail.com> wrote: > Hey Amit, > > You might want to check out PairRDDFunctions > <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions>. > For your use case in particular, you can load the file as a RDD[(String, > String)] and then use the groupByKey() function in PairRDDFunctions to get > an RDD[(String, Iterable[String])]. > > Doris > > > On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar <kumarami...@gmail.com> wrote: > >> Hi Folks, >> >> I am new to spark -and this is probably a basic question. >> >> I have a file on the hdfs >> >> 1, one >> 1, uno >> 2, two >> 2, dos >> >> I want to create a multi Map RDD RDD[Map[String,List[String]]] >> >> {"1"->["one","uno"], "2"->["two","dos"]} >> >> >> First I read the file >> val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache() >> >> val identityDataList:RDD[List[String]]= >> identityData.map{ line => >> val splits= line.split(",") >> splits.toList >> } >> >> Then I group them by the first element >> >> val grouped:RDD[(String,Iterable[List[String]])]= >> songArtistDataList.groupBy{ >> element =>{ >> element(0) >> } >> } >> >> Then I do the equivalent of mapValues of scala collections to get rid of >> the first element >> >> val groupedWithValues:RDD[(String,List[String])] = >> grouped.flatMap[(String,List[String])]{ case (key,list)=>{ >> List((key,list.map{element => { >> element(1) >> }}.toList)) >> } >> } >> >> for this to actually materialize I do collect >> >> val groupedAndCollected=groupedWithValues.collect() >> >> I get an Array[String,List[String]]. >> >> I am trying to figure out if there is a way for me to get >> Map[String,List[String]] (a multimap), or to create an >> RDD[Map[String,List[String]] ] >> >> >> I am sure there is something simpler, I would appreciate advice. >> >> Many thanks, >> Amit >> >> >> >> >> >> >> >> >> >> > -- Kind regards, Oleg