Hi, i'am just reviewing "advanced spark features". it's about the pagerank example.
it said "any shuffle operation on two RDDs will take on the partitioner of one of them, if one is set". so first we partition the Links by hashPartitioner, then we join the Links and Ranks0. Ranks0 will take the hashPartitioner according to the document. the following reduceByKey operation also respect the hashPartitioner, so when we join Links and Ranks1, there is no shuffle at all. does that mean partitions of different RDDs with the same id will go exactly to the same location even if the different RDDs locates at different nodes originally? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/same-partition-id-means-same-location-tp5136.html Sent from the Apache Spark User List mailing list archive at Nabble.com.