We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID).
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.

On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mgo...@gmail.com> wrote:
> Hi
>
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong.  Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs..  Same is true for some of our hive queries.
>
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> this.
>
> Thx
> Gopi | www.wignite.com



-- 
Harsh J

Reply via email to