[jira] [Commented] (YARN-9088) Non-exclusive labels break QueueMetrics

2023-02-18 Thread INHYANG PARK (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690806#comment-17690806
 ] 

INHYANG PARK commented on YARN-9088:


[~cjac]  Do you have any plan to fix this issue?

> Non-exclusive labels break QueueMetrics
> ---
>
> Key: YARN-9088
> URL: https://issues.apache.org/jira/browse/YARN-9088
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.5
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels 
> are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that 
> it is running on.
> If you look within the patch, sometimes metrics are updated using the 
> request.getNodeLabelExpression(). And sometimes they are updated using 
> node.getPartition().
> This means that in the case where the node is labeled while the container 
> request isn't, these metrics only get updated when referring to the default 
> queue. This stops metrics from balancing out and results in incorrect and 
> negative values in QueueMetrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9088) Non-exclusive labels break QueueMetrics

2023-01-20 Thread C.J. Collier (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679248#comment-17679248
 ] 

C.J. Collier commented on YARN-9088:


I'll review the changes and see if I can pick up where karthikpal left off. 
Here is a list of the files changed in that other patch ordered by number of 
changes to the file.

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/\
scheduler/QueueMetrics.java
scheduler/AppSchedulingInfo.java
scheduler/TestQueueMetrics.java
scheduler/capacity/CSQueueMetrics.java
scheduler/common/fica/FiCaSchedulerApp.java
scheduler/fair/FSAppAttempt.java
scheduler/capacity/LeafQueue.java
scheduler/SchedulerApplicationAttempt.java
scheduler/capacity/CSQueueUtils.java
scheduler/capacity/TestNodeLabelContainerAllocation.java
scheduler/TestSchedulerApplicationAttempt.java
scheduler/capacity/TestCapacityScheduler.java
monitor/invariants/TestMetricsInvariantChecker.java
scheduler/fair/FairScheduler.java

> Non-exclusive labels break QueueMetrics
> ---
>
> Key: YARN-9088
> URL: https://issues.apache.org/jira/browse/YARN-9088
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.5
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels 
> are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that 
> it is running on.
> If you look within the patch, sometimes metrics are updated using the 
> request.getNodeLabelExpression(). And sometimes they are updated using 
> node.getPartition().
> This means that in the case where the node is labeled while the container 
> request isn't, these metrics only get updated when referring to the default 
> queue. This stops metrics from balancing out and results in incorrect and 
> negative values in QueueMetrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9088) Non-exclusive labels break QueueMetrics

2020-03-19 Thread Anuj (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062390#comment-17062390
 ] 

Anuj commented on YARN-9088:


We are in our setup facing similar issue in which global view of pending and 
available resource is get messed up.

> Non-exclusive labels break QueueMetrics
> ---
>
> Key: YARN-9088
> URL: https://issues.apache.org/jira/browse/YARN-9088
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.5
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels 
> are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that 
> it is running on.
> If you look within the patch, sometimes metrics are updated using the 
> request.getNodeLabelExpression(). And sometimes they are updated using 
> node.getPartition().
> This means that in the case where the node is labeled while the container 
> request isn't, these metrics only get updated when referring to the default 
> queue. This stops metrics from balancing out and results in incorrect and 
> negative values in QueueMetrics. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9088) Non-exclusive labels break QueueMetrics

2019-04-16 Thread Karthik Palaniappan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819455#comment-16819455
 ] 

Karthik Palaniappan commented on YARN-9088:
---

You'd also need to change how usedCapacity from YARN-6195 is calculated. It has 
similar logic for only the default partition.

> Non-exclusive labels break QueueMetrics
> ---
>
> Key: YARN-9088
> URL: https://issues.apache.org/jira/browse/YARN-9088
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.5
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels 
> are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that 
> it is running on.
> If you look within the patch, sometimes metrics are updated using the 
> request.getNodeLabelExpression(). And sometimes they are updated using 
> node.getPartition().
> This means that in the case where the node is labeled while the container 
> request isn't, these metrics only get updated when referring to the default 
> queue. This stops metrics from balancing out and results in incorrect and 
> negative values in QueueMetrics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9088) Non-exclusive labels break QueueMetrics

2019-04-16 Thread Karthik Palaniappan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819447#comment-16819447
 ] 

Karthik Palaniappan commented on YARN-9088:
---

+1. I think we should consider rolling back YARN-6467 instead of fixing it.

I believe the original behavior was correct – metrics for the root queue should 
include metrics for all child queues and partitions. So AllocatedMB / 
AvailableMB, for example, give you a global view of cluster utilization. If 
YARN-6492 ever gets submitted, then we'll get per-partition metrics too. But I 
think YARN-6467 is the worst of both worlds – you don't get per partition 
metrics, and you don't get a global view of the cluster.

A lot of cloud providers use cluster-level YARN metrics for autoscaling, and 
YARN-6467 breaks autoscaling.

Side note: YARN-6467 was a breaking change with no documentation / release 
note. So rolling it back (another breaking change) should be fine. I'll attach 
a patch, as long as the rollback is straightforward.

> Non-exclusive labels break QueueMetrics
> ---
>
> Key: YARN-9088
> URL: https://issues.apache.org/jira/browse/YARN-9088
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.8.5
>Reporter: Brandon Scheller
>Priority: Major
>  Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels 
> are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that 
> it is running on.
> If you look within the patch, sometimes metrics are updated using the 
> request.getNodeLabelExpression(). And sometimes they are updated using 
> node.getPartition().
> This means that in the case where the node is labeled while the container 
> request isn't, these metrics only get updated when referring to the default 
> queue. This stops metrics from balancing out and results in incorrect and 
> negative values in QueueMetrics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org