[ https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346297#comment-15346297 ]
Jun Gong commented on YARN-5290: -------------------------------- Thanks [~jlowe] for reporting the issue! We came across the issue some time ago. I tried the thought in YARN-4148: RM does not release app's resource until containers actually finish and NM releases the resource. Another thought(copied from YARN-4148): NM records its total resource and available resource. When launching a container, NM checks available resource and waits until there is enough resource for container. But there might be a time gap from AM's perspective, AM thinks it has launched container, however container might be waiting for its resource. > ResourceManager can place more containers on a node than the node size allows > ----------------------------------------------------------------------------- > > Key: YARN-5290 > URL: https://issues.apache.org/jira/browse/YARN-5290 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Jason Lowe > > When the ResourceManager or an ApplicationMaster kills a container the RM > scheduler instantly thinks the container is dead and frees those resources > within the scheduler bookkeeping. However that container can still be > running on the node until the node heartbeats back into the RM and is told to > kill the container. If the RM allocates the space associated with the > released container and gives it to an AM quickly enough, the AM can launch a > new container while the old container is still running on the NM. That leads > to a scenario where we're technically running more resources on the node than > the node advertised to the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org