Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Krishna Rao
Hi, we have a daily Hive script that usually takes a few hours to run. The other day I notice one of the jobs was taking in excess of a few hours. Digging into it I saw that there were 3 attempts to launch a job on a single node: Task Id Start Time Finish Time Error

Re: Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Dieter De Witte
The ids of the tasks are different so the node got killed after failing on 3 different(!) reduce tasks. The reduce task 48 will probably have been resubmitted to another node. 2014-03-27 10:22 GMT+01:00 Krishna Rao krishnanj...@gmail.com: Hi, we have a daily Hive script that usually takes a

Re: Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Krishna Rao
I noticed, but none of the jobs ended up being re-submitted! And all 3 of those jobs failed on the same node. All we know is that the disk on that node became unresponsive. On 27 March 2014 09:33, Dieter De Witte drdwi...@gmail.com wrote: The ids of the tasks are different so the node got