[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803151#comment-14803151
 ] 

Karthik Kambatla commented on YARN-4066:
----------------------------------------

Thanks again for working on this, Johan. Took a closer look at the patch and 
have the following comments:
# A few lines are longer than 80 characters.
# For the method parameters, {{recomputeSteadyShares}} might be more 
descriptive thaan {{recalculate}}
# While at it, I would suggest the following improvements in synchronization as 
well:
## In getQueue, some of the code could be outside the synchronized block
{code}
    name = ensureRootPrefix(name);
    FSQueue queue;
    synchronized (queues) {
      queue = queues.get(name);
      if (queue == null && create) {
        // if the queue doesn't exist,create it and return
        queue = createQueue(name, queueType);
      } else {
        recalculate = false;
      }
    }

    if (recalculate) {
      rootQueue.recomputeSteadyShares();
    }
    return queue;
{code}
## In updateAllocationConfiguration, club the two synchronized blocks into one, 
and recomputeSteadyShares outside the synchronized block.

Since we are changing some of the locking that would be hard to unit-tests, 
would appreciate if you could run the updated patch through the tests you 
previously reported. 

> Large number of queues choke fair scheduler
> -------------------------------------------
>
>                 Key: YARN-4066
>                 URL: https://issues.apache.org/jira/browse/YARN-4066
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.1
>            Reporter: Johan Gustavsson
>         Attachments: yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to