[jira] [Comment Edited] (YARN-10504) Implement weight mode in Capacity Scheduler

zhuqi (Jira) Sat, 09 Jan 2021 06:08:15 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261872#comment-17261872
 ]


zhuqi edited comment on YARN-10504 at 1/9/21, 2:07 PM:
-------------------------------------------------------

[~wangda]  [~bteke] [~gandras]

1. The {{updateAbsoluteCapacitiesAndRelatedFields should update 
maxApplications, but in some case, for example:}}

{\{ in TestCapacitySchedulerAutoQueueCreation -> 
}}testAutoCreatedQueueActivationDeactivation 

 
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
    1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
    (AutoCreatedLeafQueue)
    user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in 
{{updateAbsoluteCapacitiesAndRelatedFields}}

 

 
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
  updateAbsoluteCapacities();
  CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();

  // If maxApplications not set, use the system total max app, apply newly
  // calculated abs capacity of the queue.
  if (maxApplications <= 0) {
    int maxSystemApps = schedulerConf.
        getMaximumSystemApplications();
    maxApplications =
        (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
  }
  maxApplicationsPerUser = Math.min(maxApplications,
      (int) (maxApplications * (usersManager.getUserLimit() / 100.0f)
          * usersManager.getUserLimitFactor()));
}
// because capacities will update to 0
if (availableCapacity >= leafQueueTemplateCapacities
    .getAbsoluteCapacity(nodeLabel)) {
  updateCapacityFromTemplate(capacities, nodeLabel);
  activate(leafQueue, nodeLabel);
} else{
  updateToZeroCapacity(capacities, nodeLabel);
}

// And because, the update will be after reinitializeFromTemplate
final AutoCreatedLeafQueueConfig initialLeafQueueTemplate =
    queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue);
leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate);

// Do one update cluster resource call to make sure all absolute resources
// effective resources are updated.
updateClusterResource(this.csContext.getClusterResource(),
    new ResourceLimits(this.csContext.getClusterResource()));{code}
The maxApplications and maxApplicationsPerUser will be 0. 

 

So will should handle in new logic in 

//TODO recalculate max applications because they can depend on capacity 

The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add 
logic to make this case's  maxApplications to a fixed default num.

 

2. As mentioned by [~bteke] 

"Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: 
{{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the 
getting and merging the QueueCapacities happens *before* calling the 
{{ParentQueue#updateClusterResource}} (and 
{{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource 
}}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the 
effectiveMinResource of the created queue is overridden with the template's 
effectiveMinResources which is exactly the same the test is getting in the 
asserts."

We should changed the \{{LeafQueue updateClusterResource }}to:
{code:java}
// public void updateClusterResource(Resource clusterResource,
    ResourceLimits currentResourceLimits) {
  writeLock.lock();
  try {
    ...

    if (!(this instanceof AutoCreatedLeafQueue)) {
      super.updateEffectiveResources(clusterResource);
    }

}{code}
It will fix absolute case TestAbsoluteResourceWithAutoQueue . 

If you any other advice?

Thanks.


was (Author: zhuqi):
[~wangda]  [~bteke]

1. The {{updateAbsoluteCapacitiesAndRelatedFields should update 
maxApplications, but in some case, for example:}}

{{ in TestCapacitySchedulerAutoQueueCreation -> 
}}testAutoCreatedQueueActivationDeactivation 

 
{code:java}
//submit user_3 app. This cant be allocated since there is no capacity
// in NO_LABEL, SSD but can be in GPU label
submitApp(mockRM, parentQueue, USER3, USER3, 4, 1);
final CSQueue user3LeafQueue = cs.getQueue(USER3);
validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f,
    1.0f, 1.0f);
validateCapacitiesByLabel((ManagedParentQueue) parentQueue,
    (AutoCreatedLeafQueue)
    user3LeafQueue, NODEL_LABEL_GPU);
{code}
The case is no capacity in user_3 autoCreatedLeafQueue, so in 
{{updateAbsoluteCapacitiesAndRelatedFields}}

 

 
{code:java}
private void updateAbsoluteCapacitiesAndRelatedFields() {
  updateAbsoluteCapacities();
  CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration();

  // If maxApplications not set, use the system total max app, apply newly
  // calculated abs capacity of the queue.
  if (maxApplications <= 0) {
    int maxSystemApps = schedulerConf.
        getMaximumSystemApplications();
    maxApplications =
        (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity());
  }
  maxApplicationsPerUser = Math.min(maxApplications,
      (int) (maxApplications * (usersManager.getUserLimit() / 100.0f)
          * usersManager.getUserLimitFactor()));
}
// because capacities will update to 0
if (availableCapacity >= leafQueueTemplateCapacities
    .getAbsoluteCapacity(nodeLabel)) {
  updateCapacityFromTemplate(capacities, nodeLabel);
  activate(leafQueue, nodeLabel);
} else{
  updateToZeroCapacity(capacities, nodeLabel);
}

// And because, the update will be after reinitializeFromTemplate
final AutoCreatedLeafQueueConfig initialLeafQueueTemplate =
    queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue);
leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate);

// Do one update cluster resource call to make sure all absolute resources
// effective resources are updated.
updateClusterResource(this.csContext.getClusterResource(),
    new ResourceLimits(this.csContext.getClusterResource()));{code}
The maxApplications and maxApplicationsPerUser will be 0. 

 

So will should handle in new logic in 

//TODO recalculate max applications because they can depend on capacity 

The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add 
logic to make this case's  maxApplications to a fixed default num.

 

2. As mentioned by [~bteke] 

"Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: 
{{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the 
getting and merging the QueueCapacities happens *before* calling the 
{{ParentQueue#updateClusterResource}} (and 
{{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource 
}}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the 
effectiveMinResource of the created queue is overridden with the template's 
effectiveMinResources which is exactly the same the test is getting in the 
asserts."

We should changed the {{LeafQueue updateClusterResource }}to:
{code:java}
// public void updateClusterResource(Resource clusterResource,
    ResourceLimits currentResourceLimits) {
  writeLock.lock();
  try {
    ...

    if (!(this instanceof AutoCreatedLeafQueue)) {
      super.updateEffectiveResources(clusterResource);
    }

}{code}
It will fix absolute case TestAbsoluteResourceWithAutoQueue . 

If you any other advice?

Thanks.

> Implement weight mode in Capacity Scheduler
> -------------------------------------------
>
>                 Key: YARN-10504
>                 URL: https://issues.apache.org/jira/browse/YARN-10504
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>         Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10504) Implement weight mode in Capacity Scheduler

Reply via email to