Sunil G created YARN-1408:
-----------------------------

             Summary: Preemption caused Invalid State Event: ACQUIRED at KILLED 
and caused a task timeout for 30mins
                 Key: YARN-1408
                 URL: https://issues.apache.org/jira/browse/YARN-1408
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Sunil G


Capacity preemption is enabled as follows.
 *  yarn.resourcemanager.scheduler.monitor.enable= true ,
 *  
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

Queue = a,b
Capacity of Queue A = 80%
Capacity of Queue B = 20%

Step 1: Assign a big jobA on queue a which uses full cluster capacity
Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
capacity

JobA task which uses queue b capcity is been preempted and killed.

This caused below problem:
1. New Container has got allocated for jobA in Queue A as per node update from 
an NM.
2. This container has been preempted immediately as per preemption.

Here ACQUIRED at KILLED Invalid State exception came when the next AM heartbeat 
reached RM.
ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ACQUIRED at KILLED

This also caused the Task to go for a timeout for 30minutes as this Container 
was already killed by preemption.
attempt_1380289782418_0003_m_000000_0 Timed out after 1800 secs





--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to