[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228826#comment-15228826
 ] 

Sunil G commented on YARN-3933:
-------------------------------

YARN-4809 was only focusing on refactoring CS and FS for some common code in 
its App class. And its not solving this issue.

I think the approach taken here is correct to fix the problem. But my concern 
was that, its better we do this check in {{FSAppAttempt#containerCompleted}} 
rather from {{FairScheduler#completedContainerInternal}}. So if we can check 
for return type from {{FSAppAttempt#containerCompleted}}, we can take a call 
whether to continue or return from {{completedContainerInternal}}. And I think 
it ll cover {{FaireupdateRootQueueMetrics}} in that case.


And may be after fixing this, we can try merge/refactor code with YARN-4809 
approach.


> Race condition when calling AbstractYarnScheduler.completedContainer.
> ---------------------------------------------------------------------
>
>                 Key: YARN-3933
>                 URL: https://issues.apache.org/jira/browse/YARN-3933
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>            Reporter: Lavkesh Lahngir
>            Assignee: Shiwei Guo
>         Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to