[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250908#comment-14250908
 ] 

Craig Welch commented on YARN-2637:
-----------------------------------

[~leftnoteasy] updated patch with some changes based on your comments - details 
below:

(first the lesser comments):

bq. 1. The two checks may not be necessary, they will never be null

So, yes and no.  When running "for real", no, never will be.  We have a 
multitude of mocking cases for tests, however, and at times they were.  I put 
these in to make the exceptions easier to understand in those cases.  As I had 
(previously) tracked them all down, I'll go ahead and remove these as you 
suggest, though I have mixed feelings about that, due to the fact that they may 
cause confusion for a developer down the road...

bq. 2. FiCaSchedulerApp constructor

So, I have left this in - there are a plethora of different places/ways in 
which these are being mocked in tests, and without this it's necessary to make 
a great many rather intricate changes to the test mocking.  If no value is 
provided when running in the "real" case during submission this is the default 
anyway, it did not seem dangerous to propagate that here for the test cases 
which do not travel that path and therefore are subject to npe's

bq. MockRM: Why this is needed? Is there any issue of original default value?

Tests were depending on the previous, incorrect behavior to run - the actual 
size of the AM's vs the cluster size was such that many tests will fail as 
their applications will not start (they are, effectively, "very small 
clusters").  We have targeted tests specific to the max am logic where this is 
not in play - for other cases I want to make sure it is "out of the way" so 
that they can concentrate on what they are testing, hence the change in value.

bq. TestApplicationLimits - Can you add a test for accumulated AM resource 
checking? Like app1.am + app2.am < limit but app1.am + app2.am + app3.am > 
limit. 

Yes, I think that's a good test to add - done

-re maximumActiveApplications - this is a good question.  Before this change it 
was possible to effectively set this value by just doing a bit of math because 
the "pretend" AM size was a fixed value.  Now that the real AM size is being 
used instead, and it can vary, it's no longer possible to effectively set a 
"maxActiveApplicaitons" using the amresourcelimit.  When interacting with some 
folks who were doing system testing, and while going through the unit tests, I 
found that people were, in some cases, expecting to be able to do that/ 
depending on it.  We also had some unit tests which had the same expectation.  
Based on these existing cases I was concerned that, without this, we would be 
taking away a feature that I know is being used.  I think of it is a "backward 
compatibility" requirement and I do think we need it / it has practical value.  
I've not seen maxActiveApplications per user being used in this way, and it 
would be more difficult to do that anyway, so I did not add that ability (I'm 
of the same opinion that it's better not to add something where there is not a 
clear need for it.)

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Assignee: Craig Welch
>            Priority: Critical
>         Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to