[jira] [Commented] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts

Bikas Saha (JIRA) Mon, 08 Apr 2013 13:59:17 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625802#comment-13625802
 ]


Bikas Saha commented on YARN-534:
---------------------------------

Looks good overall.

Can the message be improved a bit? e.g. because maxAppAttempts have already 
been started. Secondly, there needs to be a comment here mentioning YARN-556 
because this code needs to change when work preserving restart happens, right?
{code}
+          LOG.info("Not recovering application " + appState.getAppId() +
+              " due to hit maxAppAttempts limit");
{code}

I dont think this is needed anymore because the capacity scheduler bug is 
fixed. Earlier capacity scheduler had a bug that made applications remain in 
pending state.
{code}
+    conf.set(YarnConfiguration.RM_SCHEDULER, 
+    
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
{code}

In the test can we also start another mockrm that uses the save app submission 
context to determine that max attempts that been reached. The global limit will 
have to be increased at that time.

Can the test stop all started RM's?

Minor suggestion in the code structure in RMAppManager.recover(). IMO, we can 
use a boolean flag shouldRecover which will be set to false if app is unmanaged 
or max retries is reached. That way we dont have to duplicate the remove logic.
                
> AM max attempts is not checked when RM restart and try to recover attempts
> --------------------------------------------------------------------------
>
>                 Key: YARN-534
>                 URL: https://issues.apache.org/jira/browse/YARN-534
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-534.1.patch
>
>
> Currently,AM max attempts is only checked if the current attempt fails and 
> check to see whether to create new attempt. If the RM restarts before the 
> max-attempt fails, it'll not clean the state store, when RM comes back, it 
> will retry attempt again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts

Reply via email to