[ https://issues.apache.org/jira/browse/YARN-9209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749643#comment-16749643 ]
Feng Yuan commented on YARN-9209: --------------------------------- Hi [~tarunparimi][~cheersyang], do you fix base on 3.1.0 or other branch? > When nodePartition is not set in Placement Constraints, containers are > allocated only in default partition > ---------------------------------------------------------------------------------------------------------- > > Key: YARN-9209 > URL: https://issues.apache.org/jira/browse/YARN-9209 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler > Affects Versions: 3.1.0 > Reporter: Tarun Parimi > Priority: Major > > When application sets a placement constraint without specifying a > nodePartition, the default partition is always chosen as the constraint when > allocating containers. This can be a problem. when an application is > submitted to a queue which has doesn't have enough capacity available on the > default partition. > This is a common scenario when node labels are configured for a particular > queue. The below sample sleeper service cannot get even a single container > allocated when it is submitted to a "labeled_queue", even though enough > capacity is available on the label/partition configured for the queue. Only > the AM container runs. > {code:java}{ > "name": "sleeper-service", > "version": "1.0.0", > "queue": "labeled_queue", > "components": [ > { > "name": "sleeper", > "number_of_containers": 2, > "launch_command": "sleep 90000", > "resource": { > "cpus": 1, > "memory": "4096" > }, > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "sleeper" > ] > } > ] > } > } > ] > } > {code} > It runs fine if I specify the node_partition explicitly in the constraints > like below. > {code:java} > { > "name": "sleeper-service", > "version": "1.0.0", > "queue": "labeled_queue", > "components": [ > { > "name": "sleeper", > "number_of_containers": 2, > "launch_command": "sleep 90000", > "resource": { > "cpus": 1, > "memory": "4096" > }, > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "sleeper" > ], > "node_partitions": [ > "label" > ] > } > ] > } > } > ] > } > {code} > The problem seems to be because only the default partition "" is considered > when node_partition constraint is not specified as seen in below RM log. > {code:java} > 2019-01-17 16:51:59,921 INFO placement.SingleConstraintAppPlacementAllocator > (SingleConstraintAppPlacementAllocator.java:validateAndSetSchedulingRequest(367)) > - Successfully added SchedulingRequest to > app=appattempt_1547734161165_0010_000001 targetAllocationTags=[sleeper]. > nodePartition= > {code} > However, I think it makes more sense to consider "*" or the > {{default-node-label-expression}} of the queue if configured, when no > node_partition is specified in the placement constraint. Since not specifying > any node_partition should ideally mean we don't enforce placement constraints > on any node_partition. However we are enforcing the default partition instead > now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org