[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027012#comment-14027012
 ] 

Jian He commented on YARN-1365:
-------------------------------

Anubhav, thanks for working on the patch. some comments:

The following two pieces of code can be merged in the same if block like this ?
{code}
          // Need to register an app attempt before AM can register
          appAttempt.masterService
              .registerAppAttempt(appAttempt.applicationAttemptId);

          appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
            appAttempt.getAppAttemptId(), false, false));
{code}

we can print the current state of RMAppAttempt also which will be useful for 
debugging.
{code}
LOG.info("Skipping notifying ATTEMPT_ADDED");
{code}

We should remove the following transition also and do the same to pass a flag 
in AppAddedSchedulerEvent to notify not re-send APP_ACCEPTED event for 
recovered apps.
{code}
    // ACCECPTED state can once again receive APP_ACCEPTED event, because on
    // recovery the app returns ACCEPTED state and the app once again go
    // through the scheduler and triggers one more APP_ACCEPTED event at
    // ACCEPTED state.
    .addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
        RMAppEventType.APP_ACCEPTED)
{code}

This message is not exactly true. If RM didn’t reboot and AM can just do 
allocate without registration.
{code}
        String message =
            "Application Master is not registered for known application: "
                + appAttemptId.getApplicationId()
                + ". Looks like RM rebooted. Let AM resync.";
{code}

The following code is removed in finishApplicationMaster to not throw exception 
if the app did not register before.
If an app did not register and do unregister directly, it’ll go through 
successfully.
should we return resync command here also and client is expected to do 
re-register and then unregister?

{code}
if (!hasApplicationMasterRegistered(applicationAttemptId)) {
        String message =
            "Application Master is trying to unregister before registering for: 
"
                + applicationAttemptId.getApplicationId();
        LOG.error(message);
        RMAuditLogger.logFailure(
            this.rmContext.getRMApps()
                .get(applicationAttemptId.getApplicationId()).getUser(),
            AuditConstants.UNREGISTER_AM, "", "ApplicationMasterService",
            message, applicationAttemptId.getApplicationId(),
            applicationAttemptId);
        throw new InvalidApplicationMasterRequestException(message);
      }
{code}


> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1365
>                 URL: https://issues.apache.org/jira/browse/YARN-1365
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to