Dear all, About the index of each partition of an RDD, I am wondering whether we can keep their numbering on each physical machine in a hash partitioning process. For example, a cluster containing three physical machines A,B,C (all are workers), for an RDD with six partitions, assume that the two partitions with index 0 and 3 are in A, partitions with index 1 and 4 are in B and the ones with index 2 and 5 are in C. Then, if I hash partition the RDD using "partitionBy(new HashPartitioner(6))", will the new created RDD still have the same partition index on each machine? Is it possible that the partitions with index 0 and 3 are now on B but not A? If it is, is there any method that we can use to keep both the RDDs having the same numbering on each physical machine?
Thanks in advance. Long --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org