[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763670#comment-13763670
 ] 

Zhijie Shen commented on YARN-1149:
-----------------------------------

Conducted some investigation on the problem:

1. The following transition seems to be unnecessary, because 
APPLICATION_LOG_HANDLING_FINISHED can be emitted as early as after 
APPLICATION_STARTED is handled, when Application is already at INITING.
{code}
+          .addTransition(ApplicationState.NEW, ApplicationState.FINISHED,
+              ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED,
+              new AppShutDownTransition())
{code}

2. The following message seems not to cover all the cases:
{code}
+      LOG.info("Application " + app.getAppId() +
+          " is shutted down since NodeManager has been killed.");
{code}
In the normal case, APPLICATION_LOG_HANDLING_FINISHED is emitted after 
APPLICATION_FINISHED is handled, when Application is already at FINISHED. The 
two exceptions are: 1. NM is stopping, the running log aggregation job is 
signaled to stop early. In this case, this log info makes sense. 2. The running 
log aggregation job is interrupted. See the following code:
{code}
    while (!this.appFinishing.get()) {
      synchronized(this) {
        try {
          wait(THREAD_SLEEP_TIME);
        } catch (InterruptedException e) {
          LOG.warn("PendingContainers queue is interrupted");
          this.appFinishing.set(true);
        }
      }
    }
{code}
In this case, the message seems not to be correct.

3. Should we do the following in AppShutDownTransition as well? This is because 
APPLICATION_LOG_HANDLING_FINISHED is consumed, there'll not be the transition 
from FINISHED->FINISHED on APPLICATION_LOG_HANDLING_FINISHED, and then the app 
will always be in the context.
{code}
      app.context.getApplications().remove(appId);
      app.aclsManager.removeApplication(appId);
{code}
                
> NM throws InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1149
>                 URL: https://issues.apache.org/jira/browse/YARN-1149
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ramya Sunil
>            Assignee: Xuan Gong
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-1149.1.patch
>
>
> When nodemanager receives a kill signal when an application has finished 
> execution but log aggregation has not kicked in, 
> InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
> finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
> log-file for app application_1377459190746_0118 at 
> /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
> to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
> container container_1377459190746_0118_01_000004. Current good log dirs are 
> /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
> log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application 
> (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>  
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
>         at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application 
> (ApplicationImpl.java:handle(430)) - Application 
> application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(463)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to