Hi,

  i'am just reviewing "advanced spark features". it's about the pagerank
example.

  it said "any shuffle operation on two RDDs will take on the partitioner of
one of them, if one is set".

  so first we partition the Links by hashPartitioner, then we join the Links
and Ranks0. Ranks0 will take 
  the hashPartitioner according to the document. the following reduceByKey
operation also respect the
  hashPartitioner, so when we join Links and Ranks1, there is no shuffle at
all.

  does that mean partitions of different RDDs with the same id will go
exactly to the same location even
  if the different RDDs locates at different nodes originally?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/same-partition-id-means-same-location-tp5136.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to