Anup Agarwal created YARN-10724: ----------------------------------- Summary: Overcounting of preemptions in CapacityScheduler (LeafQueue metrics) Key: YARN-10724 URL: https://issues.apache.org/jira/browse/YARN-10724 Project: Hadoop YARN Issue Type: Bug Environment: One cause of the over-counting:
When a container is already running, SchedulerNode does not remove the container immediately from launchedContainer list and waits from the NM to kill the container. Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke signalContainersIfOvercommited (AbstractYarnScheduler) which look for containers to preempt based on the launchedContainers list. Both these calls can create a ContainerPreemptEvent for the same container (as RM is waiting for NM to kill the container). This leads LeafQueue to log metrics for the same preemption multiple times. Reporter: Anup Agarwal Currently CapacityScheduler over-counts preemption metrics inside QueueMetrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org