[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281058#comment-14281058
 ] 

Craig Welch commented on YARN-1680:
-----------------------------------

Thanks for the update, [~airbots], a couple thoughts:

I created [YARN-2848] in the hopes that it would help us to build a solution 
which could share functionality between various items with similar needs, so 
that the solution we come up with is build with that in mind.  That said, I 
think we will need to build the solutions independently, and there's no need to 
do them all at the same time.

-re Every time, App asks for blacklist addition, we check whether the nodes in 
addition are in cluster blacklist or not (O(m), m is the nodes in blacklist 
addition). If so, remove this node from addition. 

Unfortunately, I don't think that this can be solved with checks during 
addition and removal - I believe that we will need to keep a persistent picture 
of all blacklisted nodes for an application regardless of their cluster state 
because the two can vary independently and changes after a blacklist request 
may invalidate things (for example, cluster blacklists just before app 
blacklists, the app blacklist request is discarded, the cluster reinstates but 
the app still cannot use the node for reasons different from the nodes cluster 
availability - we will still include that node in headroom incorrectly...).  

I also think that, as suggested in [YARN-2848], the only approach I see working 
for all states is one where there is a last-change indicator of some sort 
active for the cluster in terms of it's node composition which is held by the 
application and, when it has updated past the application's last calculation 
for "app cluster resource" (in this case, the one which omits blacklisted 
nodes), it re-evaluates state to determine a new "app cluster resource" which 
it then uses (until a reevaluation is required, again).  This should enable the 
application to have accurate headroom information regardless of the timing of 
changes and allows for the more complex evaluations which may be needed (rack 
blacklisting, etc) while minimizing the frequency of those evaluations.  I 
don't think it is necessarily required for blacklisting, but it's worth noting 
that this could include offloading some of the calculation to the application 
master (via more informational api's / library functions for calculation) to 
distribute the cost outward.  Again, not necessarily for this case, but I 
wanted to mention it as I think it is an option now or later on.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1680
>                 URL: https://issues.apache.org/jira/browse/YARN-1680
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.2.0, 2.3.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>            Assignee: Chen He
>         Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to