Hi guys,
the latest Spark version 1.0.2 exhibits a very strange behavior when it
comes to deciding on which node a given partition should reside. The
following example was tested in the standalone Spark mode.
val partitioner = new HashPartitioner(10)
val dummyJob1 = sc.parallelize(0 until 10).map(x =>
(x,x)).partitionBy(partitioner)
dummyJob1.foreach { case (id, x) => println("Dummy1 -> Id = " + id) }
val dummyJob2 = sc.parallelize(0 until 10).map(x =>
(x,x)).partitionBy(partitioner)
dummyJob2.foreach { case (id, x) => println("Dummy2 -> Id = " + id) }
On one node I get something like:
Dummy1 -> Id = 2
Dummy2 -> Id = 7
This is a very strange behavior... One would expect that partitions with the
same ID are placed on the same node -- that is exactly happening with Spark
0.9.2 (and below), but not the latest versions (tested with 1.0.1 and
1.0.2). Can anyone explain this?
Thanks in advance,
Milos
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Where-do-my-partitions-go-tp11635.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]