[ 
https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609248#comment-14609248
 ] 

Rohit Agarwal commented on YARN-3633:
-------------------------------------

Yes, the {{clusterMaxAMShare}} is acting as an upper limit.

To maintain the current behavior, we should keep the default 
{{clusterMaxAMShare}} as 0.5 only.
Right now, the default for {{queueMaxAMShare}} is 0.5, which results in an 
implicit {{clusterMaxAMShare}} of 0.5, this is because no queue allows more 
than 50% of its resources to be allocated to AMs and hence no more than 50% of 
the cluster resources can be allocated to AMs.
With this change, queueMaxAMShare only restricts AMs when there is already at 
least one AM running in the queue. So, {{clusterMaxAMShare}} is needed to avoid 
the cluster from getting overrun with AMs (YARN-1913).

We should set {{clusterMaxAMShare}} to negative, only in those cases where we 
would set {{queueMaxAMShare}} to negative - i.e. when we don't want to restrict 
AM usage.

----------------------------------------------

Regarding synchronization, I am wondering why the existing line in 
addAMResourceUsage not synchronized? Is this code called concurrently? Also, if 
I should synchronize, should I synchronize the method or just the line I added?

> With Fair Scheduler, cluster can logjam when there are too many queues
> ----------------------------------------------------------------------
>
>                 Key: YARN-3633
>                 URL: https://issues.apache.org/jira/browse/YARN-3633
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Rohit Agarwal
>            Assignee: Rohit Agarwal
>            Priority: Critical
>         Attachments: YARN-3633-1.patch, YARN-3633.patch
>
>
> It's possible to logjam a cluster by submitting many applications at once in 
> different queues.
> For example, let's say there is a cluster with 20GB of total memory. Let's 
> say 4 users submit applications at the same time. The fair share of each 
> queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 
> 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the 
> cluster logjams. Nothing gets scheduled even when 20GB of resources are 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to