[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739372#comment-13739372
 ] 

Karthik Kambatla commented on YARN-1055:
----------------------------------------

bq. In case of a network issue where the AM is running but cannot talk to the 
RM or say the NM on which the AM was running goes down, what knob would control 
handling these situations?

For these two cases, I would use the AM-failure knob because the AM is 
"perceived" to have failed. Is there more to the question that I have totally 
missed?
                
> Handle app recovery differently for AM failures and RM restart
> --------------------------------------------------------------
>
>                 Key: YARN-1055
>                 URL: https://issues.apache.org/jira/browse/YARN-1055
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Karthik Kambatla
>
> Ideally, we would like to tolerate container, AM, RM failures. App recovery 
> for AM and RM currently relies on the max-attempts config; tolerating AM 
> failures requires it to be > 1 and tolerating RM failure/restart requires it 
> to be = 1.
> We should handle these two differently, with two separate configs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to