[Spark Core] makeRDD() preferredLocations do not appear to be considered

Tom Scott Tue, 08 Sep 2020 14:12:11 -0700

Hi Guys,

  I asked this in stack overflow here:
https://stackoverflow.com/questions/63535720/why-would-preferredlocations-not-be-enforced-on-an-empty-spark-cluster
but am hoping there is further help here.


  I have a 4 node standalone cluster with workers named worker1, worker2
and worker3 and a master on which I am running spark-shell. Given the
following example:
-----------------------------------------------------------------------------------------------------------------
import scala.collection.mutable

val someData = mutable.ArrayBuffer[(String, Seq[String])]()

someData += ("1" -> Seq("worker1"))
someData += ("2" -> Seq("worker2"))
someData += ("3" -> Seq("worker3"))

val someRdd = sc.makeRDD(someData)

someRdd.map(i=>i + ":" +
java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println)
-----------------------------------------------------------------------------------------------------------------

The cluster is completely clean with nothing else executing so I would
expect to see output:

1:worker1
2:worker2
3:worker3

but in fact the output is undefined and i see things like:

scala> someRdd.map(i=>i + ":" +
java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println)
1:worker3
2:worker1
3:worker2

scala> someRdd.map(i=>i + ":" +
java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println)
1:worker2
2:worker3
3:worker1

Am I doing this wrong or is this expected behaviour?

Thanks

  Tom

[Spark Core] makeRDD() preferredLocations do not appear to be considered

Reply via email to