[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Attachment: YARN-10724-trunk.002.patch

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Attachment: (was: YARN-10724-trunk.002.patch)

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Attachment: YARN-10724-trunk.002.patch

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Description: 
Currently CapacityScheduler over-counts preemption metrics inside QueueMetrics.

 

One cause of the over-counting:

When a container is already running, SchedulerNode does not remove the 
container immediately from launchedContainer list and waits from the NM to kill 
the container.

Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited 
(AbstractYarnScheduler) which look for containers to preempt based on the 
launchedContainers list. Both these calls can create a ContainerPreemptEvent 
for the same container (as RM is waiting for NM to kill the container). This 
leads LeafQueue to log metrics for the same preemption multiple times.

  was:Currently CapacityScheduler over-counts preemption metrics inside 
QueueMetrics.


> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Environment: (was: One cause of the over-counting:

When a container is already running, SchedulerNode does not remove the 
container immediately from launchedContainer list and waits from the NM to kill 
the container.

Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited 
(AbstractYarnScheduler) which look for containers to preempt based on the 
launchedContainers list. Both these calls can create a ContainerPreemptEvent 
for the same container (as RM is waiting for NM to kill the container). This 
leads LeafQueue to log metrics for the same preemption multiple times.)

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-03-30 Thread Anup Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anup Agarwal updated YARN-10724:

Attachment: YARN-10724-trunk.001.patch

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.
>Reporter: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org