[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228826#comment-15228826 ]
Sunil G commented on YARN-3933: ------------------------------- YARN-4809 was only focusing on refactoring CS and FS for some common code in its App class. And its not solving this issue. I think the approach taken here is correct to fix the problem. But my concern was that, its better we do this check in {{FSAppAttempt#containerCompleted}} rather from {{FairScheduler#completedContainerInternal}}. So if we can check for return type from {{FSAppAttempt#containerCompleted}}, we can take a call whether to continue or return from {{completedContainerInternal}}. And I think it ll cover {{FaireupdateRootQueueMetrics}} in that case. And may be after fixing this, we can try merge/refactor code with YARN-4809 approach. > Race condition when calling AbstractYarnScheduler.completedContainer. > --------------------------------------------------------------------- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1 > Reporter: Lavkesh Lahngir > Assignee: Shiwei Guo > Attachments: YARN-3933.001.patch, YARN-3933.002.patch, > YARN-3933.003.patch > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)