[ https://issues.apache.org/jira/browse/YARN-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958056#comment-15958056 ]
Konstantinos Karanasos commented on YARN-6344: ---------------------------------------------- Thanks for the feedback, [~jlowe]. Indeed setting the additional rack delay to 0 has the double effect of no additional rack delay _and_ to _not_ fall back to the old behavior, while -1 just falls back to old behavior. If we all agree, I wouldn't mind removing the old behavior completely. But as you say, I don't know the exact reason it is there, so it seems a less aggressive change to have the choice of falling back to the old behavior. > Rethinking OFF_SWITCH locality in CapacityScheduler > --------------------------------------------------- > > Key: YARN-6344 > URL: https://issues.apache.org/jira/browse/YARN-6344 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: Konstantinos Karanasos > Assignee: Konstantinos Karanasos > Attachments: YARN-6344.001.patch, YARN-6344.002.patch, > YARN-6344.003.patch, YARN-6344.004.patch > > > When relaxing locality from node to rack, the {{node-locality-parameter}} is > used: when scheduling opportunities for a scheduler key are more than the > value of this parameter, we relax locality and try to assign the container to > a node in the corresponding rack. > On the other hand, when relaxing locality to off-switch (i.e., assign the > container anywhere in the cluster), we are using a {{localityWaitFactor}}, > which is computed based on the number of outstanding requests for a specific > scheduler key, which is divided by the size of the cluster. > In case of applications that request containers in big batches (e.g., > traditional MR jobs), and for relatively small clusters, the > localityWaitFactor does not affect relaxing locality much. > However, in case of applications that request containers in small batches, > this load factor takes a very small value, which leads to assigning > off-switch containers too soon. This situation is even more pronounced in big > clusters. > For example, if an application requests only one container per request, the > locality will be relaxed after a single missed scheduling opportunity. > The purpose of this JIRA is to rethink the way we are relaxing locality for > off-switch assignments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org