The RDDs aren't changing; you are assigning new RDDs to rdd_0 and rdd_1. Operations like join and reduceByKey are making distinct, new partitions that don't correspond 1-1 with old partitions anyway.
On Fri, Oct 17, 2014 at 5:32 AM, randylu <randyl...@gmail.com> wrote: > Dear all, > In my test programer, there are 3 partitions for each RDD, the iteration > procedure is as follows: > var rdd_0 = ... // init > for (...) { > *rdd_1* = *rdd_0*.reduceByKey(...).partitionBy(p) // calculate rdd_1 > from rdd_0 > *rdd_0* = *rdd_0*.partitionBy(p).join(*rdd_1*)... // update rdd_0 > by rdd_1 > *rdd_0*./action/() > } > I thought rdd_0 and rdd_1 are part by the same partitioner, and their > corresponding partitions are on the same node. for example, rdd_0's > partition_0 and rdd_1's partiiton_0 are on the same node in each iteration. > But in fact, rdd_0's partition_0 changes its location between workers. > Any way to make rdd_0 and rdd_1's partitions not changing their locations, > and their corresponding partitions are on the same node for fast join() ? > Best Regards, > randy > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/why-do-RDD-s-partitions-migrate-between-worker-nodes-in-different-iterations-tp16669.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org