[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
- Containers Running: -19
- Memory Used: -38GB
- Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> - Containers Running: -19
> - Memory Used: -38GB
> - Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that is possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150625#comment-15150625
 ] 

Dmytro Kabakchei commented on YARN-4698:


Have anybody else met this issue? Does anybody have any ideas what is the 
reason and how to solve this?

> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that is possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Attachment: mitigating2.5.1.diff

> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that is possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that is possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that is possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating01.patch).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that is possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Attachment: Example.log-cut

> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that is possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating01.patch).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race 
condition for container that was scheduled for killing, but was completed 
successfully before kill.
Also, there is a patch that is possibly mitigates effects of the issue, but 
doesn't solve original problem (see mitigating01.patch).
Unfortunately, the cluster and all other logs are lost, because the report was 
made about a year ago, but wasn't submitted properly. Also, we don't know if 
the issue exist in other versions.

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
> Attachments: Example.log-cut
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race 
> condition for container that was scheduled for killing, but was completed 
> successfully before kill.
> Also, there is a patch that is possibly mitigates effects of the issue, but 
> doesn't solve original problem (see mitigating01.patch).
> Unfortunately, the cluster and all other logs are lost, because the report 
> was made about a year ago, but wasn't submitted properly. Also, we don't know 
> if the issue exist in other versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
I checked their resource manager logs.
These events happened.
Assigned container: 67019 times
Released container: 67019 times
Invalid container released: 19 times


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Kabakchei updated YARN-4698:
---
Description: 
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
I checked their resource manager logs.
These events happened.
Assigned container: 67019 times
Released container: 67019 times
Invalid container released: 19 times

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:


> Negative value in RM UI counters due to double container release
> 
>
> Key: YARN-4698
> URL: https://issues.apache.org/jira/browse/YARN-4698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Dmytro Kabakchei
>Priority: Minor
>
> We noticed that on our cluster there are negative values in RM UI counters:
> -Containers Running: -19
> -Memory Used: -38GB
> -Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> I checked their resource manager logs.
> These events happened.
> Assigned container: 67019 times
> Released container: 67019 times
> Invalid container released: 19 times



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4698) Negative value in RM UI counters due to double container release

2016-02-17 Thread Dmytro Kabakchei (JIRA)
Dmytro Kabakchei created YARN-4698:
--

 Summary: Negative value in RM UI counters due to double container 
release
 Key: YARN-4698
 URL: https://issues.apache.org/jira/browse/YARN-4698
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Dmytro Kabakchei
Priority: Minor


We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)