[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212844#comment-15212844 ]
Shiwei Guo commented on YARN-3933: ---------------------------------- Sorry for long time to reply. Did you mean adding this check in the 'containerCompleted' and 'unreserve' ? I suppose that is not enough, cause the 'updateRootQueueMetrics' call in completedContainerInternal will still substract more than once the resource this container released. > Race condition when calling AbstractYarnScheduler.completedContainer. > --------------------------------------------------------------------- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1 > Reporter: Lavkesh Lahngir > Assignee: Shiwei Guo > Attachments: YARN-3933.001.patch, YARN-3933.002.patch, > YARN-3933.003.patch > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)