Timeouts at reduce stage

2008-08-29 Thread Иван
>From time to time I'm experiencing huge decrease of performance while running >some MR jobs. The reason have revealed itself quite easily - some tasks have failed according to JobTracker's web interface. Record reporting such a failure usually looks somehow like this (usually appears at exact r

Re: Timeouts at reduce stage

2008-08-29 Thread Miles Osborne
The problem here is that when a mapper fails, it may either be due to some bug within that mapper OR it may be due to hardware problems of one kind and another (disks getting full etc etc). if you configure hadoop to use job replication, then in either case, a failing job will get resubmitted mult

Re: Timeouts at reduce stage

2008-08-29 Thread Karl Anderson
On 29-Aug-08, at 3:53 AM, Иван wrote: Thanks for a fast reply, but in fact it sometimes fails even on default MR jobs like, for example, rowcounter job from HBase 0.2.0 distribution. Hardware problems are theoretically possible, but they doesn't seem to be the case because everything else

Re: Timeouts at reduce stage

2008-09-01 Thread Jason Venner
ults in materials about real childs). Maybe this situation is quite common and there is a definite reason or solution? Thanks! Ivan Blinkov -Original Message- From: Karl Anderson <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Date: Fri, 29 Aug 2008 13:17:18 -0700 Subject: Re: Timeo

Re: Timeouts at reduce stage

2008-09-04 Thread Doug Cutting
Jason Venner wrote: We have modified the /main/ that launches the children of the task tracker to explicity exit, in it's finally block. That helps substantially. Have you submitted this as a patch? Doug

Re: Timeouts at reduce stage

2008-09-05 Thread 叶双明
:) 2008/9/5, Doug Cutting <[EMAIL PROTECTED]>: > > Jason Venner wrote: > >> We have modified the /main/ that launches the children of the task tracker >> to explicity exit, in it's finally block. That helps substantially. >> > > Have you submitted this as a patch? > > Doug >

Re: Re: Timeouts at reduce stage

2008-08-29 Thread Иван
Thanks for a fast reply, but in fact it sometimes fails even on default MR jobs like, for example, rowcounter job from HBase 0.2.0 distribution. Hardware problems are theoretically possible, but they doesn't seem to be the case because everything else is operating fine on the same set of servers

Re[2]: Timeouts at reduce stage

2008-08-29 Thread Иван
tion? Thanks! Ivan Blinkov -Original Message- From: Karl Anderson <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Date: Fri, 29 Aug 2008 13:17:18 -0700 Subject: Re: Timeouts at reduce stage > > On 29-Aug-08, at 3:53 AM, Иван wrote: > > > Thanks for a