[ https://issues.apache.org/jira/browse/YARN-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926830#comment-15926830 ]
Konstantinos Karanasos edited comment on YARN-6344 at 3/15/17 9:41 PM: ----------------------------------------------------------------------- The described issue was observed together with [~asuresh] and [~curino], while examining some of our internal applications that request containers in small batches. We also brought it to the attention of [~jlowe]. One way to solve the problem is to introduce an additional parameter, namely {{rack-locality-delay}}, which will work similarly to the {{node-locality-delay}}, and will therefore bypass the locality wait factor. This way we will attempt assigning off-switch containers after {{rack-locality-delay}} missed opportunities. Thoughts? [~arun.sur...@gmail.com] was (Author: kkaranasos): The described issue was observed together with [~asuresh] and [~curino], while examining some of our internal applications that request containers in small batches. We also brought it to the attention of [~jlowe]. One way to solve the problem is to introduce an additional parameter, namely {{rack-locality-delay}}, which will work similarly to the {{node-locality-delay}}, and will therefore bypass the locality wait factor. This way we will attempt assigning off-switch containers after {{rack-locality-delay}} missed opportunities. Thoughts? > Rethinking OFF_SWITCH locality in CapacityScheduler > --------------------------------------------------- > > Key: YARN-6344 > URL: https://issues.apache.org/jira/browse/YARN-6344 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Reporter: Konstantinos Karanasos > > When relaxing locality from node to rack, the {{node-locality-parameter}} is > used: when scheduling opportunities for a scheduler key are more than the > value of this parameter, we relax locality and try to assign the container to > a node in the corresponding rack. > On the other hand, when relaxing locality to off-switch (i.e., assign the > container anywhere in the cluster), we are using a {{localityWaitFactor}}, > which is computed based on the number of outstanding requests for a specific > scheduler key, which is divided by the size of the cluster. > In case of applications that request containers in big batches (e.g., > traditional MR jobs), and for relatively small clusters, the > localityWaitFactor does not affect relaxing locality much. > However, in case of applications that request containers in small batches, > this load factor takes a very small value, which leads to assigning > off-switch containers too soon. This situation is even more pronounced in big > clusters. > For example, if an application requests only one container per request, the > locality will be relaxed after a single missed scheduling opportunity. > The purpose of this JIRA is to rethink the way we are relaxing locality for > off-switch assignments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org