Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli
Is this MapReduce application? MR has a concept of blacklisting nodes where a lot of tasks fail. The configs that control it are - yarn.app.mapreduce.am.job.node-blacklisting.enable: True by default - mapreduce.job.maxtaskfailures.per.tracker: Default is 3, meaning a node is blacklisted if it

Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli
Replies inline > Here is my question: is there a mechanisms that when one container exit > abnormally, yarn will prefer to dispatch the container on other NM? Acting on container exit is a responsibility left to ApplicationMasters. For e.g. MapReduce ApplicationMaster explicitly tells YARN t

Re: Question about container recovery

2014-12-10 Thread scwf
It seems there is a blacklist in yarn when all containers of one NM lost, it will add this NM to blacklist? Then when will the NM go out of blacklist? On 2014/12/10 13:39, scwf wrote: Hi, all Here is my question: is there a mechanisms that when one container exit abnormally, yarn will prefe

Question about container recovery

2014-12-09 Thread scwf
Hi, all Here is my question: is there a mechanisms that when one container exit abnormally, yarn will prefer to dispatch the container on other NM? We have a cluster with 3 NMs(each NM 135g mem) and 1 RM, and we running a job which start 13 container(= 1 AM + 12 executor containers). Each NM