It would be very helpful if there is any such tool, but the distributed
nature may be difficult to capture.

I had been trying to run a task where merging the accumulators was taking
an inordinately long time and was not reflecting in the standalone
cluster's web UI.
What I think will be useful is to publish metrics for different lifecycle
stages of a job to a system like Ganglia to zero in on bottlenecks. Perhaps
the user can define some of the metrics in the config.

I have been thinking of tinkering with the metrics publisher to add custom
metrics to get a bigger picture and time breakdown.

On Mon, Dec 29, 2014 at 10:24 AM, rapelly kartheek <kartheek.m...@gmail.com>
wrote:

> Hi,
>
> I want to find the time taken for replicating an rdd in spark cluster
> along with the computation time on the replicated rdd.
>
> Can someone please suggest a suitable spark profiler?
>
> Thank you
>

Reply via email to