[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765048#comment-13765048
 ] 

Xuan Gong commented on YARN-1149:
---------------------------------

New patch addresses several other issues:
1. adding more transition to 
FINISHING_CONTAINERS_WAIT,APPLICATION_RESOURCES_CLEANINGUP and FINISHED. 
2. When the applications start to shut down. It is very possible that there are 
another applications added into the context. 
{code}
setBlockNewContainerRequests(true);
{code}
Set this at the beginning of ContainerManager::serviceStop() to block any new 
container Requests

{code}
          try {
            Thread.sleep(1000);
            this.handle(
                new CMgrCompletedAppsEvent(new ArrayList<ApplicationId>(
                    applications.keySet()),
                    CMgrCompletedAppsEvent.Reason.ON_SHUTDOWN));
          } catch (InterruptedException ex) {
            LOG.warn("Interrupted while sleeping on applications finish on 
shutdown",
              ex);
          }
{code}

Also do this at the ShutDown block and Resync block. For all old applications 
(which have already in context and have already received the FINISH_APPLICATION 
event), they will ignore the events since they are already in 
FINISHING_CONTAINERS_WAIT or APPLICATION_RESOURCES_CLEANINGUP. For all the 
newly added applications, when they receives the
FINISH_APPLICATION event, they will start to shut down. 
                
> NM throws InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1149
>                 URL: https://issues.apache.org/jira/browse/YARN-1149
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ramya Sunil
>            Assignee: Xuan Gong
>             Fix For: 2.1.1-beta
>
>         Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
> YARN-1149.4.patch
>
>
> When nodemanager receives a kill signal when an application has finished 
> execution but log aggregation has not kicked in, 
> InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
> finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
> log-file for app application_1377459190746_0118 at 
> /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
> to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
> container container_1377459190746_0118_01_000004. Current good log dirs are 
> /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
> log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application 
> (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>  
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
>         at java.lang.Thread.run(Thread.java:662)
> 2013-08-25 20:45:00,926 INFO  application.Application 
> (ApplicationImpl.java:handle(430)) - Application 
> application_1377459190746_0118 transitioned from RUNNING to null
> 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(463)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 8040
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to