I could not iterate thru the set but changed the code to get what I was looking 
for(Not elegant but gets me going)
package org.medicalsidefx.common.utils

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._

import scala.collection.mutable.ArrayBuffer

/**
 * Created by sansub01 on 11/19/14.
 */
object TwoWayJoin2 {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: TwoWayJoinCount <file1>   <file2>")
      System.exit(12)
    }

    val sconf = new 
SparkConf().setMaster("local").setAppName("MedicalSideFx-TwoWayJoin")

    val sc = new SparkContext(sconf)

    val file1 = args(0)
    val file2 = args(1)

    val file1Rdd = sc.textFile(file1).map(x => (x.split(",")(0), 
x.split(",")(1)))
    val file2Rdd = sc.textFile(file2).map(x => (x.split(",")(0), 
x.split(",")(1))).reduceByKey((v1,v2) => v1+"|"+v2)

    file1Rdd.collect().foreach(println)
    file2Rdd.collect().foreach(println)

    file1Rdd.join(file2Rdd).collect().foreach( e => 
println(e.toString.replace("(","").replace(")","")))

  }
}

      From: Jey Kottalam <j...@cs.berkeley.edu>
 To: Sanjay Subramanian <sanjaysubraman...@yahoo.com> 
Cc: Arun Ahuja <aahuj...@gmail.com>; Andrew Ash <and...@andrewash.com>; user 
<user@spark.apache.org> 
 Sent: Friday, November 21, 2014 10:07 PM
 Subject: Extracting values from a Collecion
   
Hi Sanjay,

These are instances of the standard Scala collection type "Set", and its 
documentation can be found by googling the phrase "scala set".

Hope that helps,
-Jey



On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian 
<sanjaysubraman...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>



  

Reply via email to