Prabhu Joseph created YARN-4730:
-----------------------------------

             Summary: YARN preemption based on instantaneous fair share
                 Key: YARN-4730
                 URL: https://issues.apache.org/jira/browse/YARN-4730
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Prabhu Joseph


On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair 
Sheduler having 230 queues and total 60000 jobs run a day. [ all 230 queues are 
very critical and hence the minResource is same for all]. On this case, when a 
Spark Job is run on queue A and which occupies the entire cluster resource and 
does not release any resource, another job submitted into queue B and 
preemption is getting only the Fair Share which is <10TB , 3000> / 230 = <45 GB 
, 13 cores> which is very less fair share for a queue.shared by many 
applications. 

The Preemption should get the instantaneous fair Share, that is <10TB, 3000> / 
2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the 
entire cluster resource and also the subsequent jobs run fine.

This issue is only when the number of queues are very high. In case of less 
number of queues, Preemption getting Fair Share would be suffice as the fair 
share will be high. But in case of too many number of queues, Preemption should 
try to get the instantaneous Fair Share.

Note: Configuring optimal maxResources to 230 queues is difficult and also 
putting constraint for the queues using maxResource will leave  cluster 
resource idle most of the time.
        There are 1000s of Spark Jobs, so asking each user to restrict the 
number of executors is also difficult.

Preempting Instantaneous Fair Share will help to overcome the above issues.

          








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to