Hello everyone:

I'm running an experiment in a Spark cluster where some of the machines are
highly loaded with CPU, memory and network consuming process ( let's call
them straggler machines ).

Obviously the tasks of these machines take longer to execute than in other
nodes of the cluster. However I've noticed that the tasks that fetch
shuffle data from these "straggler machines" are also delayed with long
Read Shuffle Data phases.

Is there anyway of knowing from which machines a task is reading its
shuffle data?. Something like node1 is reading its shuffle data from
[node2,node3 and node4]?

Thanks in advance

Alvaro,

Reply via email to