[ https://issues.apache.org/jira/browse/YARN-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632320#comment-14632320 ]
zhihai xu commented on YARN-433: -------------------------------- Thanks for the patch [~xgong]! The patch looks good to me except two nits: # It may be better to use {{containerId}} instead of {{remoteContainer.getContainerId()}} as the parameter of {{containerAllocationExpirer.unregister}} since they are equivalent. # Since we do {{containerAllocationExpirer.unregister}} for the completed containers in {{RMNodeImpl#handleContainerStatus}} earlier, Maybe we can remove the redundant code {{containerAllocationExpirer.unregister}} at {{RMContainerImpl#ContainerFinishedAtAcquiredState}}. After we remove this code at {{RMContainerImpl#ContainerFinishedAtAcquiredState}}, ContainerFinishedAtAcquiredState will be the same as FinishedTransition, So we can replace ContainerFinishedAtAcquiredState with FinishedTransition and remove ContainerFinishedAtAcquiredState. {code} .addTransition(RMContainerState.ACQUIRED, RMContainerState.COMPLETED, RMContainerEventType.FINISHED, new FinishedTransition()) {code} > When RM is catching up with node updates then it should not expire acquired > containers > -------------------------------------------------------------------------------------- > > Key: YARN-433 > URL: https://issues.apache.org/jira/browse/YARN-433 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Xuan Gong > Attachments: YARN-433.1.patch, YARN-433.2.patch, YARN-433.3.patch > > > RM expires containers that are not launched within some time of being > allocated. The default is 10mins. When an RM is not keeping up with node > updates then it may not be aware of new launched containers. If the expire > thread fires for such containers then the RM can expire them even though they > may have launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)