[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273104#comment-15273104
 ] 

Arun Suresh commented on YARN-2888:
-----------------------------------

Thanks for the review [~kkaranasos].

I agree with most of your comments and I have addressed them in the latest 
patch. For the rest...

bq. Rename ContainerQueuingLimit* to NMQueuingLimit*?
Hmmm... I prefer to keep it as ContainerQueuingLimit, since it is a struct that 
is part of the NM heartbeat response.. which establishes the 'NM' aspect of it 
and 'ContainerQueuing' more explicitly expresses the fact that we are queuing 
containers.

bq. Why is it needed to change the return type of getContainerManager() to 
ContainerManager  ?
With this patch, we need to set the queuing limit etc on the ContainerManager. 
One option is to introduce the setter etc. method into the Protocol, where I 
don't think it belongs, since it is a property of the ContainerManager entity, 
not the protocol. Another option is to type cast the return type into the 
QueuingContainerManagerImpl, which does not seem clean either. Given all this 
and considering that we have multiple implementations of the ContainerManager, 
I felt this seemed cleaner.

bq. In pruneOpportunisticContainerQueue(), let's use the same logic/code as in 
the stopContainerInternal()..
I feel this is code patch is a bit simpler.. so Id prefer to leave it as it 
is.. But yes, I have changed the variable names and method name for better 
clarity

In {{QueueLimitCalculator}}
* Ive removed median
* The calculations are now independent of the size of k
 

> Corrective mechanisms for rebalancing NM container queues
> ---------------------------------------------------------
>
>                 Key: YARN-2888
>                 URL: https://issues.apache.org/jira/browse/YARN-2888
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Arun Suresh
>         Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch, 
> YARN-2888.005.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to