[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764651#comment-13764651 ]
Xuan Gong commented on YARN-1149: --------------------------------- Remove the container_cleanUp logic from NM, instead we add Application_cleanUp in ContainerManager. So, the shutdown process will become: NM call serviceStop()-->ContainerManager call serviceStop()---> send applicationFinishEvent to all running applications, and applications will send containerKillEvent to all its containers ---> After all its container is killed, the applications will go to Finished State and wait for the AppLogsAggregated process to finish, after that, the application will be removed from the context. Similar in Resync process : NM call resyncWithRM() --> containerManager call cleanUpApplications() to send applicationFinished Events to all running applications, and applications will kill its running containers > NM throws InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > ----------------------------------------------------------------------------------------------------- > > Key: YARN-1149 > URL: https://issues.apache.org/jira/browse/YARN-1149 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Ramya Sunil > Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1149.1.patch, YARN-1149.2.patch > > > When nodemanager receives a kill signal when an application has finished > execution but log aggregation has not kicked in, > InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown > {noformat} > 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just > finished : application_1377459190746_0118 > 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate > log-file for app application_1377459190746_0118 at > /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp > 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService > (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation > to complete for application_1377459190746_0118 > 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for > container container_1377459190746_0118_01_000004. Current good log dirs are > /tmp/yarn/local > 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate > log-file for app application_1377459190746_0118 > 2013-08-25 20:45:00,925 WARN application.Application > (ApplicationImpl.java:handle(427)) - Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) > at java.lang.Thread.run(Thread.java:662) > 2013-08-25 20:45:00,926 INFO application.Application > (ApplicationImpl.java:handle(430)) - Application > application_1377459190746_0118 transitioned from RUNNING to null > 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(463)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping > server on 8040 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira