[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

Craig Welch (JIRA) Wed, 03 Dec 2014 09:54:48 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233244#comment-14233244
 ]


Craig Welch commented on YARN-2637:
-----------------------------------

First, the easy parts :-) 

bq. typo for manualy

fixed

bq. usedAMResources is not used by sub-class, so suggest to replace it with 
private

done

bq. Exception messages here should be more meaningful than "c1", or "c2".

yup - fixed

bq. The log level here should be info or warn level rather than debug level. In 
most cases, LOG.debug() should be under block of LOG.isDebugEnabled().

So, I had made this debug rather than something higher because I'm not sure we 
always care, and it doesn't represent a failure case - this is normal/expected 
case, and other similar cases for not starting the app don't log at all.  But, 
I can see that it will be helpful to know this, and I don't think that it will 
result in excessive logging - so I went ahead and made it an "info" level, 
sound good?  BTW, the "isXYZenabled" idiom is to save the cost of evaluating 
the argument construction for the log message as these can be very expensive, 
but for cheap cases like this (a string literal) it's not necessary as the only 
cost is going to be the same evaluation for logging which will happen during 
the call

Now for the more complicated one:

bq. Looks like maxAMResourcePerQueuePercent is a allowed percent for AM 
resource in each queue. So we may should calculate amLimit per queue rather 
than aggregate all applications together. 

So, yes and no - the current behavior actually takes the maxAM... which is set 
globally and it apportions it out based on the queue's baseline share of the 
cluster - so if the maxam was say, 10%, and a given queue had 50% of the 
cluster, it would have an effective maxampercent value of 5% (it's translated 
into "how many apps can I have running" based on the minallocation of the 
cluster rather than actual am usage - which is the problem which prompted the 
fix - but the important thing to get here is the way the overall maxampercent 
is apportioned out to the queues)  There is also the option to override on a 
per queue basis, so that, in the above scenario, if you didn't like the queue 
getting the 5% based on the overall process, but you were happy with how other 
queues were working using the config, you could just override for the given 
queue.

When I tried to translate this into something which was actually paying 
attention to the real usage of the ams, two approaches seemed reasonable:

1. Just have a global used am resource value, use the global am percent 
everywhere (not apportioned) - this way the total cluster level effect is what 
we want - in this case, the subdivision of the amresource percent value is 
replaced with a total summing of the used resource amongst the queues.  You can 
still override for a given queue if you want "this queue to be able to go 
higher", which has the effective result of allowing one queue to go higher than 
the others, this could starve other queues (bad) but that was already possible 
with the other approach, albeit in a different way (when the cluster came to be 
filled with AM's from one particular queue.). 

2.  We could subdivide the global maxampercent based on the queue share of the 
baseline (as before) and then have a per-queue amresource percent (and amused) 
which are evaluated - this would not be a difficult change from the current 
approach, but I think it is problematic for the reason below 

The main reason I took approach number one over two is that I was concerned 
that with a complex queue structure where there was a reasonable level of 
subdivision in a smallish cluster you could end up with a queue which can 
effectively never start anything because the final value is too small to ever 
be able to start one of the larger AM's we have these days.  By sharing it 
globally this is less likely to happen because that "unused am resource" 
allocated out to other queues which have a larger share of the cluster is not 
potentially sitting idle while "leaf queue a.b.c" has a derived maxampercent of 
say 2%, which translates into 512mb, and so can never start an application 
master which needs 1G (even though, globally, there's more than enough 
ampercent to do so).  It's the "this queue can never start an am over x size" 
that concerns me.  There are other possible ways to handle this with option 2, 
but I'm concerned that they would add complexity to the behavior and change the 
behavior more than is needed to correct the defect.

[~djp]  Make sense?  Thoughts?  I may take a go at option 2 so we can evaluate 
it, but I'm concerned about the small cluster/too much subdivision scenario 
being problematic.


> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Assignee: Craig Welch
>            Priority: Critical
>         Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.2.patch, YARN-2637.6.patch, 
> YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

Reply via email to