[ 
https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869335#comment-16869335
 ] 

Tarun Parimi commented on YARN-9209:
------------------------------------

Thanks [~cheersyang]  . Correct, the logic is same for a normal resource 
request. 

On rereading [~leftnoteasy] comments, I see there are limitations in supporting 
ANY partition, even for a normal resource request when we don't consider PC.

But as you had mentioned previously, PC supports affinity to only single 
partition currently. I think we may need to document this also. 

> When nodePartition is not set in Placement Constraints, containers are 
> allocated only in default partition
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9209
>                 URL: https://issues.apache.org/jira/browse/YARN-9209
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, scheduler
>    Affects Versions: 3.1.0
>            Reporter: Tarun Parimi
>            Assignee: Tarun Parimi
>            Priority: Major
>         Attachments: YARN-9209.001.patch, YARN-9209.002.patch, 
> YARN-9209.003.patch
>
>
> When application sets a placement constraint without specifying a 
> nodePartition, the default partition is always chosen as the constraint when 
> allocating containers. This can be a problem. when an application is 
> submitted to a queue which has doesn't have enough capacity available on the 
> default partition.
>  This is a common scenario when node labels are configured for a particular 
> queue. The below sample sleeper service cannot get even a single container 
> allocated when it is submitted to a "labeled_queue", even though enough 
> capacity is available on the label/partition configured for the queue. Only 
> the AM container runs. 
> {code:java}{
>     "name": "sleeper-service",
>     "version": "1.0.0",
>     "queue": "labeled_queue",
>     "components": [
>         {
>             "name": "sleeper",
>             "number_of_containers": 2,
>             "launch_command": "sleep 90000",
>             "resource": {
>                 "cpus": 1,
>                 "memory": "4096"
>             },
>             "placement_policy": {
>                 "constraints": [
>                     {
>                         "type": "ANTI_AFFINITY",
>                         "scope": "NODE",
>                         "target_tags": [
>                             "sleeper"
>                         ]
>                     }
>                 ]
>             }
>         }
>     ]
> }
> {code}
> It runs fine if I specify the node_partition explicitly in the constraints 
> like below. 
> {code:java}
> {
>     "name": "sleeper-service",
>     "version": "1.0.0",
>     "queue": "labeled_queue",
>     "components": [
>         {
>             "name": "sleeper",
>             "number_of_containers": 2,
>             "launch_command": "sleep 90000",
>             "resource": {
>                 "cpus": 1,
>                 "memory": "4096"
>             },
>             "placement_policy": {
>                 "constraints": [
>                     {
>                         "type": "ANTI_AFFINITY",
>                         "scope": "NODE",
>                         "target_tags": [
>                             "sleeper"
>                         ],
>                         "node_partitions": [
>                             "label"
>                         ]
>                     }
>                 ]
>             }
>         }
>     ]
> }
> {code} 
> The problem seems to be because only the default partition "" is considered 
> when node_partition constraint is not specified as seen in below RM log. 
> {code:java}
> 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator 
> (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367))
>  - Successfully added SchedulingRequest to 
> app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. 
> nodePartition= 
> {code} 
> However, I think it makes more sense to consider "*" or the 
> {{default-node-label-expression}} of the queue if configured, when no 
> node_partition is specified in the placement constraint. Since not specifying 
> any node_partition should ideally mean we don't enforce placement constraints 
> on any node_partition. However we are enforcing the default partition instead 
> now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to