Hi,
How do i do broadcast/map join on RDDs? I have a large RDD that i want to inner join with a small RDD. Instead of having the large RDD repartitioned and shuffled for join, i would rather send a copy of a small RDD to each task, and then perform the join locally. How would i specify this in Spark code? I didn't find much documentation online. I attempted to create a broadcast variable out of the small RDD and then access that in the join operator: largeRdd.join(smallRddBroadCastVar.value) but that didn't work as expected ( I found that all rows with same key were on same task) I am using Spark version 1.0.1 Thanks, pala