One way to do that is to use RDD.toDebugString to check the dependency
graph and it also gives a good idea regarding stages.


On Mon, Aug 4, 2014 at 8:55 PM, rpandya <[email protected]> wrote:

> Is there a way to visualize the task dependency graph of an application,
> during or after its execution? The list of stages on port 4040 is useful,
> but still quite limited. For example, I've found that if I don't cache()
> the
> result of one expensive computation, it will get repeated 4 times, but it
> is
> not easy to trace through exactly why. Ideally, what I would like for each
> stage is:
> - the individual tasks and their dependencies
> - the various RDD operators that have been applied
> - the full stack trace, both for the stage barrier, the task, and for the
> lambdas used (often the RDDs are manipulated inside layers of code, so the
> immediate file/line# is not enough)
>
> Any suggestions?
>
> Thanks,
>
> Ravi
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-stage-task-dependency-graph-tp11404.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to