I have a large list of edges as a 5000 partition RDD. Now, I'm doing a simple
but
shuffle-heavy operation:

val g = Graph.fromEdges(edges, ...).partitionBy(...)
val subs = Graph(g.collectEdges(...), g.edges).collectNeighbors()
subs.saveAsObjectFile("hdfs://...")

The job gets divided into 9 stages. My cluster has 3 workers in the same
local network.
Even though Spark 1.5.0 works much faster and first several stages run on
the full load,
starting from one of the stages, a single machine suddenly grabs takes 99%
of the tasks
while others take as many tasks as they have cores and wait until the one
machine
finishes everything. Interestingly, on Spark 1.3.1, all stages get their
tasks distributed
evenly among the cluster machines. I'm suspecting that this could be a bug
in 1.5.0



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Uneven-distribution-of-tasks-among-workers-in-Spark-GraphX-1-5-0-tp24763.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to