Since, my earlier question is still unanswered, I have decided to dig into the spark code myself. However, I am new to spark as well as scala in particular. Can some one help me understand the following code snippet:
1. def cogroup[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (Seq[V], Seq[W]))] = { 2. val cg = new CoGroupedRDD[K](Seq(self, other), partitioner) 3. val prfs = new PairRDDFunctions[K, Seq[Seq[_]]](cg)(classTag[K], ClassTags.seqSeqClassTag) 4. prfs.mapValues { case Seq(vs, ws) => (vs.asInstanceOf[Seq[V]], ws.asInstanceOf[Seq[W]]) 5. } 6. } Thanks, rose On Friday, January 24, 2014 4:32 PM, rose <rosek...@yahoo.com> wrote: Hi all, I want to know more about join operation in spark. I know it uses hash join, but I am not able to figure out the nature of the implementation such as blocking, non blocking, or shared , not shared partitions. If anyone knows, please reply to this post along with the linkers of the implementation in the spark source files. Thanks, rose -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hash-Join-in-Spark-tp873.html Sent from the Apache Spark User List mailing list archive at Nabble.com.