[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407493#comment-16407493
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 5:56 AM:
----------------------------------------------------------

Thanks [~leftnoteasy].
{quote}Existing the patch is not backward compatible: it breaks behavior of an 
app requests locality + node partition. Before your patch, the behavior is, if 
requested locality is under requested partition, it can be allocated, otherwise 
it will keep in pending state. After your patch, requested partition will be 
silently ignored, which is not ideal. And it breaks how we calculate pending 
resource of each partition.
{quote}
Seems the behavior is that YARN just *do not allow* to specify node locality + 
node partition together, so it is meaningless to talk about "ignore one and 
take another". Please check below code in both 2.7 and trunk:

 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
    && labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
      "Invalid resource request, queue=" + queueInfo.getQueueName()
          + " specified node label expression in a "
          + "resource request has resource name = "
          + resReq.getResourceName());
}
{code}
 

 

So, I think the behavior is the same before and after the patch, i.e.:

If an app requests locality + node partition, an 
InvalidLabelResourceRequestException will be throw (request failed).

 

What the patch does is that, it allow user to just specify locality and without 
specify nodelabel to request one specific labeled node.

 

The pending resource of each partition is really a good concern.

Any plan to totally support node locality to work with node label?

 

Thanks again :)

 

 


was (Author: yqwang):
Thanks [~leftnoteasy].
{quote}Existing the patch is not backward compatible: it breaks behavior of an 
app requests locality + node partition. Before your patch, the behavior is, if 
requested locality is under requested partition, it can be allocated, otherwise 
it will keep in pending state. After your patch, requested partition will be 
silently ignored, which is not ideal. And it breaks how we calculate pending 
resource of each partition.
{quote}
Seems the behavior is that YARN just *do not allow* to specify node locality + 
node partition together, so it is meaningless to talk about "ignore one and 
take another". Please check below code in both 2.7 and trunk:

 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
    && labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
      "Invalid resource request, queue=" + queueInfo.getQueueName()
          + " specified node label expression in a "
          + "resource request has resource name = "
          + resReq.getResourceName());
}
{code}
 

 

So, I think the behavior is the same before and after the patch, i.e.:

If an app requests locality + node partition, an 
InvalidLabelResourceRequestException will be throw (request failed).

 

What the patch does is that, it allow user to just specify locality and without 
specify nodelabel to select one specific node.

 

The pending resource of each partition is really a good concern.

Any plan to totally support node locality to work with node label?

 

Thanks again :)

 

 

> labeled node cannot be used to satisfy locality specified request
> -----------------------------------------------------------------
>
>                 Key: YARN-7872
>                 URL: https://issues.apache.org/jira/browse/YARN-7872
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>            Priority: Blocker
>             Fix For: 2.7.2
>
>         Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to