[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207405#comment-14207405
 ] 

Craig Welch commented on YARN-2637:
-----------------------------------

I think the fix is fairly straightforward - there is an "amResource" property 
on the SchedulerApplicationAttempt / FiCaSchedulerApp, it does not appear to be 
being populated in the CapacityScheduler case (but it should be, and the 
information is available in the submission / from the resource requests of the 
appliction) - populate this value, and then add a Resource property to 
LeafQueue which represents the resources used by active application masters - 
when an application starts, add it's amResource value to the LeafQueue's active 
application master resource value, when an application ends, remove it.  Before 
starting an application compare the sum of the active application masters + the 
new application's resource to the resource represented by the percentage of 
cluster resource allowed to be used by am's in the queue  (this can differ by 
queue...) and if it exceeds the value do not start the application.  The 
existing trickle down logic base on the minimum allocation should be removed, 
there is also logic regarding how many applications can be running based on 
explicit configuration which should be retained.

{code}
if ((queue.activeApplicationMasterResourceTotal + 
readyToStartApplication.applicationMasterResource) <= 
queue.portionOfClusterResourceAllowedForApplicatoinMaster * clusterResource && 
maxAllowedApplications < runningApplications + 1) {
  queue.startTheApp
}
{code}


> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Priority: Critical
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to