[ https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-8709: --------------------------- Attachment: YARN-8709.002.patch > intra-queue preemption checker always fail since one under-served queue was > deleted > ----------------------------------------------------------------------------------- > > Key: YARN-8709 > URL: https://issues.apache.org/jira/browse/YARN-8709 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, scheduler preemption > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Major > Attachments: YARN-8709.001.patch, YARN-8709.002.patch > > > After some queues deleted, the preemption checker in SchedulingMonitor was > always skipped because of YarnRuntimeException for every run. > Error logs: > {noformat} > ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: > Exception raised while executing preemption checker, skip this run..., > exception= > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't > happen, cannot find TempQueuePerPartition for queueName=1535075839208 > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {noformat} > I think there is something wrong with partitionToUnderServedQueues field in > ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues > can be add but never be removed, except rebuilding this policy. For example, > once under-served queue "a" is added into this structure, it will always be > there and never be removed, intra-queue preemption checker will try to get > all queues info for partitionToUnderServedQueues in > IntraQueueCandidatesSelector#selectCandidates and will throw > YarnRuntimeException if not found. So that after queue "a" is deleted from > queue structure, the preemption checker will always fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org