[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150625#comment-15150625 ]
Dmytro Kabakchei commented on YARN-4698: ---------------------------------------- Have anybody else met this issue? Does anybody have any ideas what is the reason and how to solve this? > Negative value in RM UI counters due to double container release > ---------------------------------------------------------------- > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager > Affects Versions: 2.5.1 > Reporter: Dmytro Kabakchei > Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)