I could not iterate thru the set but changed the code to get what I was looking for(Not elegant but gets me going) package org.medicalsidefx.common.utils
import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.SparkContext._ import scala.collection.mutable.ArrayBuffer /** * Created by sansub01 on 11/19/14. */ object TwoWayJoin2 { def main(args: Array[String]) { if (args.length < 2) { System.err.println("Usage: TwoWayJoinCount <file1> <file2>") System.exit(12) } val sconf = new SparkConf().setMaster("local").setAppName("MedicalSideFx-TwoWayJoin") val sc = new SparkContext(sconf) val file1 = args(0) val file2 = args(1) val file1Rdd = sc.textFile(file1).map(x => (x.split(",")(0), x.split(",")(1))) val file2Rdd = sc.textFile(file2).map(x => (x.split(",")(0), x.split(",")(1))).reduceByKey((v1,v2) => v1+"|"+v2) file1Rdd.collect().foreach(println) file2Rdd.collect().foreach(println) file1Rdd.join(file2Rdd).collect().foreach( e => println(e.toString.replace("(","").replace(")",""))) } } From: Jey Kottalam <j...@cs.berkeley.edu> To: Sanjay Subramanian <sanjaysubraman...@yahoo.com> Cc: Arun Ahuja <aahuj...@gmail.com>; Andrew Ash <and...@andrewash.com>; user <user@spark.apache.org> Sent: Friday, November 21, 2014 10:07 PM Subject: Extracting values from a Collecion Hi Sanjay, These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set". Hope that helps, -Jey On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sanjaysubraman...@yahoo.com.invalid> wrote: > hey guys > > names.txt > ========= > 1,paul > 2,john > 3,george > 4,ringo > > > songs.txt > ========= > 1,Yesterday > 2,Julia > 3,While My Guitar Gently Weeps > 4,With a Little Help From My Friends > 1,Michelle > 2,Nowhere Man > 3,Norwegian Wood > 4,Octopus's Garden > > What I want to do is real simple > > Desired Output > ============== > (4,(With a Little Help From My Friends, Octopus's Garden)) > (2,(Julia, Nowhere Man)) > (3,(While My Guitar Gently Weeps, Norwegian Wood)) > (1,(Yesterday, Michelle)) > > > My Code > ======= > val file1Rdd = > sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => > (x.split(",")(0), x.split(",")(1))) > val file2Rdd = > sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => > (x.split(",")(0), x.split(",")(1))) > val file2RddGrp = file2Rdd.groupByKey() > file2Rdd.groupByKey().mapValues(names => > names.toSet).collect().foreach(println) > > Result > ======= > (4,Set(With a Little Help From My Friends, Octopus's Garden)) > (2,Set(Julia, Nowhere Man)) > (3,Set(While My Guitar Gently Weeps, Norwegian Wood)) > (1,Set(Yesterday, Michelle)) > > > How can I extract values from the Set ? > > Thanks > > sanjay >