[ https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802654#comment-17802654 ]
Shilun Fan commented on YARN-10243: ----------------------------------- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Rack-only localization constraint for MR AM is broken for CapacityScheduler > --------------------------------------------------------------------------- > > Key: YARN-10243 > URL: https://issues.apache.org/jira/browse/YARN-10243 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler > Affects Versions: 3.2.0 > Reporter: Adam Antal > Assignee: Bilwa S T > Priority: Major > > Reproduction: Start a MR sleep job with strict-locality configured for AM > ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If > CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). > Root cause: if there are no other resources requested (like node locality or > other constraint), the scheduling opportunities counter will not be > incremented and the following piece of code always returns false (so we > always skip this constraint) resulting in an infinite loop: > {code:java} > // If we are here, we do need containers on this rack for RACK_LOCAL req > if (type == NodeType.RACK_LOCAL) { > // 'Delay' rack-local just a little bit... > long missedOpportunities = > application.getSchedulingOpportunities(schedulerKey); > return getActualNodeLocalityDelay() < missedOpportunities; > } > {code} > Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to > enforce this rule to be processed immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org