Miles Crawford created YARN-4931:
------------------------------------

             Summary: Preempted resources go back to 
                 Key: YARN-4931
                 URL: https://issues.apache.org/jira/browse/YARN-4931
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 2.7.2
            Reporter: Miles Crawford


Sometimes a queue that needs resources causes preemption - but the preempted 
containers are just allocated right back to the application that just released 
them!

Here is a tiny application (0007) that wants resources, and a container is 
preempted from application 0002 to satisfy it:
{code}
2016-04-07 21:08:13,463 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
(FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for 
queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare 
= <memory:448, vCores:0>
2016-04-07 21:08:13,463 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
(FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264, 
vCores:1>) from queue root.milesc
2016-04-07 21:08:13,463 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics 
(FairSchedulerUpdateThread): Non-AM container preempted, current 
appAttemptId=appattempt_1460047303577_0002_000001, 
containerId=container_1460047303577_0002_01_001038, resource=<memory:15264, 
vCores:1>
2016-04-07 21:08:13,463 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container 
Transitioned from RUNNING to KILLED
{/code}

But then a moment later, application 00002 gets the container right back:
{code}
2016-04-07 21:08:13,844 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode 
(ResourceManager Event Processor): Assigned container 
container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on 
host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, 
<memory:241248, vCores:18> used and <memory:416, vCores:46> available after 
allocation
2016-04-07 21:08:14,555 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC 
Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container 
Transitioned from ALLOCATED to ACQUIRED
2016-04-07 21:08:14,845 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
(ResourceManager Event Processor): container_1460047303577_0002_01_001039 
Container Transitioned from ACQUIRED to RUNNING
{/code}

This results in new applications being unable to even get an AM, and never 
starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to