[ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273104#comment-15273104 ]
Arun Suresh commented on YARN-2888: ----------------------------------- Thanks for the review [~kkaranasos]. I agree with most of your comments and I have addressed them in the latest patch. For the rest... bq. Rename ContainerQueuingLimit* to NMQueuingLimit*? Hmmm... I prefer to keep it as ContainerQueuingLimit, since it is a struct that is part of the NM heartbeat response.. which establishes the 'NM' aspect of it and 'ContainerQueuing' more explicitly expresses the fact that we are queuing containers. bq. Why is it needed to change the return type of getContainerManager() to ContainerManager ? With this patch, we need to set the queuing limit etc on the ContainerManager. One option is to introduce the setter etc. method into the Protocol, where I don't think it belongs, since it is a property of the ContainerManager entity, not the protocol. Another option is to type cast the return type into the QueuingContainerManagerImpl, which does not seem clean either. Given all this and considering that we have multiple implementations of the ContainerManager, I felt this seemed cleaner. bq. In pruneOpportunisticContainerQueue(), let's use the same logic/code as in the stopContainerInternal().. I feel this is code patch is a bit simpler.. so Id prefer to leave it as it is.. But yes, I have changed the variable names and method name for better clarity In {{QueueLimitCalculator}} * Ive removed median * The calculations are now independent of the size of k > Corrective mechanisms for rebalancing NM container queues > --------------------------------------------------------- > > Key: YARN-2888 > URL: https://issues.apache.org/jira/browse/YARN-2888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Reporter: Konstantinos Karanasos > Assignee: Arun Suresh > Attachments: YARN-2888-yarn-2877.001.patch, > YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch, > YARN-2888.005.patch > > > Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of > the scheduling decisions or due to having a stale image of the system) may > lead to an imbalance in the waiting times of the NM container queues. This > can in turn have an impact in job execution times and cluster utilization. > To this end, we introduce corrective mechanisms that may remove (whenever > needed) container requests from overloaded queues, adding them to less-loaded > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org