Hi, I am working with Spark in Java on top of a HDFS cluster. In my code two RDDs are partitioned with the same partitioner (HashPartitioner with the same number of partitions), so they are co-partitioned. Thus same keys are on the same partitions' number but that does not mean that both RDDs are necessarily co-located, that's to say that same partitions are on same nodes. For example partition#1 from RDD#1 may not be on the same node as partition#1 from RDD#2. I would like to co-locate partitioned RDDs to reduce data transfer between nodes when applying a join operation on the RDDs. Is there a way to do that?
Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-co-locate-partitions-from-two-partitioned-RDDs-tp26008.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org