[ https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405933#comment-16405933 ]
kyungwan nam commented on YARN-8020: ------------------------------------ [~eepayne] Sorry for the late response. I've seen this problem in branch-2.8 and HDP-2.6.4. Cluster * Cluster total resources : <405 GB, 240 VCores> * default Queue: 50%, 100% max capacity * pri Queue: 50% capacity, 100% max capacity * label1 Queue: 0% capacity, 0% max capacity * there is ’label1’ non-exclusive node-label in my cluster. but, all nodes are included in the default node-label. capacity-scheduler {code} yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled=true yarn.scheduler.capacity.reservations-continue-look-all-nodes=true yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator yarn.scheduler.capacity.root.accessible-node-labels.label1.capacity=100 yarn.scheduler.capacity.root.acl_administer_queue= yarn.scheduler.capacity.root.acl_submit_applications= yarn.scheduler.capacity.root.acl_submit_queue=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.accessible-node-labels= yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=50 yarn.scheduler.capacity.root.default.maximum-applications=100 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.default.priority=1 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=3 yarn.scheduler.capacity.root.label1.accessible-node-labels=label1 yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.capacity=100 yarn.scheduler.capacity.root.label1.accessible-node-labels.label1.maximum-am-resource-percent=0.7 yarn.scheduler.capacity.root.label1.acl_submit_applications=* yarn.scheduler.capacity.root.label1.capacity=0 yarn.scheduler.capacity.root.label1.default-node-label-expression=label1 yarn.scheduler.capacity.root.label1.maximum-am-resource-percent=0.7 yarn.scheduler.capacity.root.label1.maximum-applications=100 yarn.scheduler.capacity.root.label1.maximum-capacity=0 yarn.scheduler.capacity.root.label1.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.label1.priority=1 yarn.scheduler.capacity.root.label1.state=RUNNING yarn.scheduler.capacity.root.label1.user-limit-factor=3 yarn.scheduler.capacity.root.ordering-policy=priority-utilization yarn.scheduler.capacity.root.pri.accessible-node-labels= yarn.scheduler.capacity.root.pri.acl_submit_applications=* yarn.scheduler.capacity.root.pri.capacity=50 yarn.scheduler.capacity.root.pri.maximum-capacity=100 yarn.scheduler.capacity.root.pri.minimum-user-limit-percent=50 yarn.scheduler.capacity.root.pri.priority=1 yarn.scheduler.capacity.root.pri.state=RUNNING yarn.scheduler.capacity.root.pri.user-limit-factor=3 yarn.scheduler.capacity.root.queues=default,pri,label1 {code} how to reproduce * app1, which asking for <1GB, 1 VCore> AM container and 29 * <1GB, 8 VCores> containers has been submitted to default Queue. * after all containers for app1 have been allocated, submit app2, which asking for <1GB, 1 VCore> AM container and 14 * <1GB, 8VCores> containers to pri queue * as expected, some containers for app1 are preempted {code:java} 2018-03-19 21:51:50,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - Trying to use org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector to select preemption candidates 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:50,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: <memory:1024, vCores:1> PEN: <memory:14336, vCores:112> RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5 IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(209)) - Queue=default partition= resource-to-obtain=<memory:-83465, vCores:24> 2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: default CUR: <memory:30720, vCores:233> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0 IDEAL_ASSIGNED: <memory:399360, vCores:127> IDEAL_PREEMPT: <memory:-83465, vCores:24> ACTUAL_PREEMPT: <memory:-83465, vCores:24> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:-176640, vCores:113> 2018-03-19 21:51:50,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) - QUEUESTATE: 1521463910271, default, 30720, 233, 0, 0, 207360, 120, 399360, 127, -83465, 24, -83465, 24, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 1024, 1, 14336, 112, 207360, 120, 15360, 113, 0, 0, 0, 0 2018-03-19 21:51:50,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300)) - Starting to preempt containers for selectedCandidates and size:1{code} * but, shortly after that, preemption does not happen no longer {code:java} 2018-03-19 21:51:52,771 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006 Container Transitioned from ALLOCATED to ACQUIRED 2018-03-19 21:51:53,267 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(451)) - container_e49_1521339603918_0013_01_000006 Container Transitioned from ACQUIRED to RUNNING 2018-03-19 21:51:53,270 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:containerBasedPreemptOrKill(428)) - Trying to use org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector to select preemption candidates 2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: label1 CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:53,270 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: <memory:4096, vCores:25> PEN: <memory:11264, vCores:88> RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 0.5 IDEAL_ASSIGNED: <memory:15360, vCores:113> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: pri CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: default CUR: <memory:27648, vCores:209> PEN: <memory:3072, vCores:1> RESERVED: <memory:0, vCores:0> GAR: <memory:207360, vCores:120> NORM: 1.0 IDEAL_ASSIGNED: <memory:30720, vCores:210> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:-179712, vCores:89> 2018-03-19 21:51:53,271 DEBUG capacity.PreemptableResourceCalculator (PreemptableResourceCalculator.java:calculateResToObtainByPartitionForLeafQueues(219)) - NAME: default CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0> RESERVED: <memory:0, vCores:0> GAR: <memory:0, vCores:0> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0> 2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:logToCSV(549)) - QUEUESTATE: 1521463913271, default, 27648, 209, 3072, 1, 207360, 120, 30720, 210, 0, 0, 0, 0, label1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, pri, 4096, 25, 11264, 88, 207360, 120, 15360, 113, 0, 0, 0, 0 2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(300)) - Starting to preempt containers for selectedCandidates and size:0 2018-03-19 21:51:53,271 DEBUG capacity.ProportionalCapacityPreemptionPolicy (ProportionalCapacityPreemptionPolicy.java:editSchedule(293)) - Total time used=1 ms.{code} > when DRF is used, preemption does not trigger due to incorrect idealAssigned > ---------------------------------------------------------------------------- > > Key: YARN-8020 > URL: https://issues.apache.org/jira/browse/YARN-8020 > Project: Hadoop YARN > Issue Type: Bug > Reporter: kyungwan nam > Priority: Major > > I’ve met that Inter Queue Preemption does not work. > It happens when DRF is used and submitting application with a large number of > vcores. > IMHO, idealAssigned can be set incorrectly by following code. > {code} > // This function "accepts" all the resources it can (pending) and return > // the unused ones > Resource offer(Resource avail, ResourceCalculator rc, > Resource clusterResource, boolean considersReservedResource) { > Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax( > Resources.subtract(getMax(), idealAssigned), > Resource.newInstance(0, 0)); > // accepted = min{avail, > // max - assigned, > // current + pending - assigned, > // # Make sure a queue will not get more than max of its > // # used/guaranteed, this is to make sure preemption won't > // # happen if all active queues are beyond their guaranteed > // # This is for leaf queue only. > // max(guaranteed, used) - assigned} > // remain = avail - accepted > Resource accepted = Resources.min(rc, clusterResource, > absMaxCapIdealAssignedDelta, > Resources.min(rc, clusterResource, avail, Resources > /* > * When we're using FifoPreemptionSelector (considerReservedResource > * = false). > * > * We should deduct reserved resource from pending to avoid > excessive > * preemption: > * > * For example, if an under-utilized queue has used = reserved = 20. > * Preemption policy will try to preempt 20 containers (which is not > * satisfied) from different hosts. > * > * In FifoPreemptionSelector, there's no guarantee that preempted > * resource can be used by pending request, so policy will preempt > * resources repeatly. > */ > .subtract(Resources.add(getUsed(), > (considersReservedResource ? pending : pendingDeductReserved)), > idealAssigned))); > {code} > let’s say, > * cluster resource : <Memory:200GB, VCores:20> > * idealAssigned(assigned): <Memory:100GB, VCores:10> > * avail: <Memory:181GB, Vcores:1> > * current: <Memory:19GB, Vcores:19> > * pending: <Memory:0, Vcores:0> > current + pending - assigned: <Memory:-181GB, Vcores:9> > min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9> > accepted: <Memory:-181GB, Vcores:9> > as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not > trigger preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org