[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: - Containers Running: -19 - Memory Used: -38GB - Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > - Containers Running: -19 > - Memory Used: -38GB > - Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that is possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150625#comment-15150625 ] Dmytro Kabakchei commented on YARN-4698: Have anybody else met this issue? Does anybody have any ideas what is the reason and how to solve this? > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Attachment: mitigating2.5.1.diff > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that is possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating2.5.1diff). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that is possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating01.patch). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut, mitigating2.5.1.diff > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating2.5.1diff). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Attachment: Example.log-cut > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating01.patch). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times Some log records related can be found within "Example.log-cut" attachment. After some investigation we made a conclusion that there is some kind of race condition for container that was scheduled for killing, but was completed successfully before kill. Also, there is a patch that is possibly mitigates effects of the issue, but doesn't solve original problem (see mitigating01.patch). Unfortunately, the cluster and all other logs are lost, because the report was made about a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other versions. was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > Attachments: Example.log-cut > > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times > Some log records related can be found within "Example.log-cut" attachment. > After some investigation we made a conclusion that there is some kind of race > condition for container that was scheduled for killing, but was completed > successfully before kill. > Also, there is a patch that is possibly mitigates effects of the issue, but > doesn't solve original problem (see mitigating01.patch). > Unfortunately, the cluster and all other logs are lost, because the report > was made about a year ago, but wasn't submitted properly. Also, we don't know > if the issue exist in other versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: - Assigned container: 67019 times - Released container: 67019 times - Invalid container released: 19 times was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: I checked their resource manager logs. These events happened. Assigned container: 67019 times Released container: 67019 times Invalid container released: 19 times > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > - Assigned container: 67019 times > - Released container: 67019 times > - Invalid container released: 19 times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
[ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Kabakchei updated YARN-4698: --- Description: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: I checked their resource manager logs. These events happened. Assigned container: 67019 times Released container: 67019 times Invalid container released: 19 times was: We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: > Negative value in RM UI counters due to double container release > > > Key: YARN-4698 > URL: https://issues.apache.org/jira/browse/YARN-4698 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.5.1 >Reporter: Dmytro Kabakchei >Priority: Minor > > We noticed that on our cluster there are negative values in RM UI counters: > -Containers Running: -19 > -Memory Used: -38GB > -Vcores Used: -19 > After we checked RM logs, we found, that the following events had happened: > I checked their resource manager logs. > These events happened. > Assigned container: 67019 times > Released container: 67019 times > Invalid container released: 19 times -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4698) Negative value in RM UI counters due to double container release
Dmytro Kabakchei created YARN-4698: -- Summary: Negative value in RM UI counters due to double container release Key: YARN-4698 URL: https://issues.apache.org/jira/browse/YARN-4698 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 2.5.1 Reporter: Dmytro Kabakchei Priority: Minor We noticed that on our cluster there are negative values in RM UI counters: -Containers Running: -19 -Memory Used: -38GB -Vcores Used: -19 After we checked RM logs, we found, that the following events had happened: -- This message was sent by Atlassian JIRA (v6.3.4#6332)