[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Attachment: YARN-10724-trunk.002.patch > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Attachment: (was: YARN-10724-trunk.002.patch) > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Attachment: YARN-10724-trunk.002.patch > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Assignee: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Description: Currently CapacityScheduler over-counts preemption metrics inside QueueMetrics. One cause of the over-counting: When a container is already running, SchedulerNode does not remove the container immediately from launchedContainer list and waits from the NM to kill the container. Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited (AbstractYarnScheduler) which look for containers to preempt based on the launchedContainers list. Both these calls can create a ContainerPreemptEvent for the same container (as RM is waiting for NM to kill the container). This leads LeafQueue to log metrics for the same preemption multiple times. was:Currently CapacityScheduler over-counts preemption metrics inside QueueMetrics. > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. > > One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Environment: (was: One cause of the over-counting: When a container is already running, SchedulerNode does not remove the container immediately from launchedContainer list and waits from the NM to kill the container. Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited (AbstractYarnScheduler) which look for containers to preempt based on the launchedContainers list. Both these calls can create a ContainerPreemptEvent for the same container (as RM is waiting for NM to kill the container). This leads LeafQueue to log metrics for the same preemption multiple times.) > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
[ https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anup Agarwal updated YARN-10724: Attachment: YARN-10724-trunk.001.patch > Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) > > > Key: YARN-10724 > URL: https://issues.apache.org/jira/browse/YARN-10724 > Project: Hadoop YARN > Issue Type: Bug > Environment: One cause of the over-counting: > When a container is already running, SchedulerNode does not remove the > container immediately from launchedContainer list and waits from the NM to > kill the container. > Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke > signalContainersIfOvercommited (AbstractYarnScheduler) which look for > containers to preempt based on the launchedContainers list. Both these calls > can create a ContainerPreemptEvent for the same container (as RM is waiting > for NM to kill the container). This leads LeafQueue to log metrics for the > same preemption multiple times. >Reporter: Anup Agarwal >Priority: Minor > Attachments: YARN-10724-trunk.001.patch > > > Currently CapacityScheduler over-counts preemption metrics inside > QueueMetrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org