[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265480#comment-14265480
 ] 

Craig Welch commented on YARN-2637:
-----------------------------------


bq. Regarding null checks in FiCaSchedulerApp. Since scheduler assumes 
application is in running state when adding FiCaSchedulerApp. It is a big issue 
if RMApp cannot be found at that time. So comparing to just ignore such error, 
I think you need throw exception (if that exception will not cause RM shutdown) 
and log such error.

I'm not quite sure how to phrase this differently to get the point across - it 
is already the case throughout the many mocking points which interact with this 
code that the rmapp may be null at this point (if it were not the case it would 
not be necessary to check for it).  As I mentioned previously, the 
ResourceManager itself checks for this case.  I am not introducing the mocking 
which resulted in this state, or even existing checks for it in non-test code, 
I'm receiving this state and carrying it forward in the same way as it has been 
done elsewhere (and, again, not simply in tests).  Changing this is not 
something which belongs in the scope of this jira because it represents a 
rationalization/overhaul of mocking throughout this area (resource manager, 
schedulers), it is non-trivial and not specific to or properly within the scope 
of this change.  Feel free to create a separate jira to improve the mocking 
throughout the code.  The separate null-check for the amresourcerequest is 
necessitated by the apparently intentional behavior of unmanaged am's.

bq. And when this is possible?

+      if (rmContext.getScheduler() != null) 

again, in existing test paths, and existing code is tolerant of this as well, 
I'm merely carrying it forward - it would belong in the new jira as well, were 
one opened

bq. \t in leafqueue - I've checked and the spacing is consistent with the 
existing spacing in the file.


> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Assignee: Craig Welch
>            Priority: Critical
>         Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
> YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
> YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to