Hi,

I am using Spark 1.6.1, and I am looking at the Event Timeline on "Details
for Stage" Spark UI web page in detail.

I found that the "scheduler delay" on event timeline is somehow
misrepresented. I want to confirm if my understanding is correct.

Here is the detailed description:
In Spark's code, I found that the definition of "SCHEDULER_DELAY" is that
"scheduler delay includes time to ship the task from the scheduler to the
executor, and time to send the task result from the executor to the
scheduler. If scheduler delay is large, consider decreasing the size of
tasks or decreasing the size of task results"

My interpretation of the definition is that the scheduler delay has two
components. The first component happens at the beginning of a task when
scheduler assigns task executable to the executor; The second component
happens at the end of a task when the scheduler collects the results from
the executor.

However, on the event timeline figure, there is only one section for the
scheduler delay at the beginning of each task, whose length represents the
SUM of these two components. This means that the following "Task
Deserialization Time" , “Shuffle Read Time", "Executor Computing Time",
etc, should have started earlier on this event timeline figure.


Best,
Xiaoye

Reply via email to