Re: Hash Join in Spark

rose kunj Sun, 02 Feb 2014 20:48:31 -0800

Since, my earlier question is still unanswered, I have decided to dig into the 
spark code myself. However, I am new to spark as well as scala in particular. 
Can some one help me understand the following code snippet:


1. def cogroup[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, 
(Seq[V], Seq[W]))] = {
2.    val cg = new CoGroupedRDD[K](Seq(self, other), partitioner)
3.    val prfs = new PairRDDFunctions[K, Seq[Seq[_]]](cg)(classTag[K], 
ClassTags.seqSeqClassTag)
4.   prfs.mapValues { case Seq(vs, ws) =>
      (vs.asInstanceOf[Seq[V]], ws.asInstanceOf[Seq[W]])
5.    }
6. }


Thanks,
rose


On Friday, January 24, 2014 4:32 PM, rose <rosek...@yahoo.com> wrote:
 
Hi all,

I want to know more about join operation in spark. I know it uses hash join,
but I am not able to figure out the  nature of the implementation such as
blocking, non blocking, or shared , not shared partitions.

If anyone knows, please reply to this post along with the linkers of the
implementation in the spark source files.

Thanks,
rose



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hash-Join-in-Spark-tp873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Hash Join in Spark

Reply via email to