Hi Guys, I asked this in stack overflow here: https://stackoverflow.com/questions/63535720/why-would-preferredlocations-not-be-enforced-on-an-empty-spark-cluster but am hoping there is further help here.
I have a 4 node standalone cluster with workers named worker1, worker2 and worker3 and a master on which I am running spark-shell. Given the following example: ----------------------------------------------------------------------------------------------------------------- import scala.collection.mutable val someData = mutable.ArrayBuffer[(String, Seq[String])]() someData += ("1" -> Seq("worker1")) someData += ("2" -> Seq("worker2")) someData += ("3" -> Seq("worker3")) val someRdd = sc.makeRDD(someData) someRdd.map(i=>i + ":" + java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println) ----------------------------------------------------------------------------------------------------------------- The cluster is completely clean with nothing else executing so I would expect to see output: 1:worker1 2:worker2 3:worker3 but in fact the output is undefined and i see things like: scala> someRdd.map(i=>i + ":" + java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println) 1:worker3 2:worker1 3:worker2 scala> someRdd.map(i=>i + ":" + java.net.InetAddress.getLocalHost().getHostName()).collect().foreach(println) 1:worker2 2:worker3 3:worker1 Am I doing this wrong or is this expected behaviour? Thanks Tom