[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243832#comment-14243832
 ] 

Wangda Tan commented on YARN-2637:
----------------------------------

Hi Craig,
First is a major comment, I'm not quite sure if it was discussed:
I feel {{MAXIMUM_ACTIVE_APPLICATIONS_SUFFIX}} and 
{{DEFAULT_MAXIMUM_ACTIVE_QUEUE_APPLICATIONS}} makes configuration becoming 
complex. Since we already have MAXIMUM_AM_RESOURCE_SUFFIX, it should be enough 
to define how much resource could be used for AMs in a queue. And other fields 
of LeafQueue like {{maxActiveApplicationsPerUser}} doesn't have such a manual 
configuration as well. It is possible that {{maxActiveApplicationsPerUser}} and 
{{maxActiveApplications}} (which set manually) not match.

Is there any actual requirements to add the two manual parameter? I prefer to 
drop the two options to keep the patch simple if there's no actual requirements.

Some minor comments:
1. The two checks may not be necessary, they will never be null:
{code}
+      if (application.getAMResource() == null) {
+        throw new RuntimeException("Application getAMResource returned 
'null'");
+      }
+      if (usedAMResources == null) {
+        throw new RuntimeException("Queue's usedAMResources is 'null'"); 
+      }
{code}

2. FiCaSchedulerApp constructor
Similar to above, when amRequest will be null? And also when 
amRequest.getCapability() will be null? 
This seems dangerous to me:
{code}
+    if (amResource == null) {
+      amResource = Resource.newInstance(0, 0);
+    }
{code}
You should throw exception when amResource == null is illegal, a valid case I 
can think about is unmanaged AM, could you check that?

3. MockRM:
Why this is needed? Is there any issue of original default value?
{code}
+  protected void setTestConfigs(Configuration conf) {
+    conf.set(
+    
CapacitySchedulerConfiguration.MAXIMUM_APPLICATION_MASTERS_RESOURCE_PERCENT,
+    "1.0");
+  }
{code}

And also similar change in TestResourceManager/TestCapacityScheduler

4. TestApplicationLimits
Can you add a test for accumulated AM resource checking? Like app1.am + app2.am 
< limit but app1.am + app2.am + app3.am > limit. Since you have such logic in 
LeafQueue.

Thanks,
Wangda 


> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-2637
>                 URL: https://issues.apache.org/jira/browse/YARN-2637
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Wangda Tan
>            Assignee: Craig Welch
>            Priority: Critical
>         Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.2.patch, YARN-2637.6.patch, 
> YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
>     for (Iterator<FiCaSchedulerApp> i=pendingApplications.iterator(); 
>          i.hasNext(); ) {
>       FiCaSchedulerApp application = i.next();
>       
>       // Check queue limit
>       if (getNumActiveApplications() >= getMaximumActiveApplications()) {
>         break;
>       }
>       
>       // Check user limit
>       User user = getUser(application.getUser());
>       if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
>         user.activateApplication();
>         activeApplications.add(application);
>         i.remove();
>         LOG.info("Application " + application.getApplicationId() +
>             " from user: " + application.getUser() + 
>             " activated in queue: " + getQueueName());
>       }
>     }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to