[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281058#comment-14281058 ]
Craig Welch commented on YARN-1680: ----------------------------------- Thanks for the update, [~airbots], a couple thoughts: I created [YARN-2848] in the hopes that it would help us to build a solution which could share functionality between various items with similar needs, so that the solution we come up with is build with that in mind. That said, I think we will need to build the solutions independently, and there's no need to do them all at the same time. -re Every time, App asks for blacklist addition, we check whether the nodes in addition are in cluster blacklist or not (O(m), m is the nodes in blacklist addition). If so, remove this node from addition. Unfortunately, I don't think that this can be solved with checks during addition and removal - I believe that we will need to keep a persistent picture of all blacklisted nodes for an application regardless of their cluster state because the two can vary independently and changes after a blacklist request may invalidate things (for example, cluster blacklists just before app blacklists, the app blacklist request is discarded, the cluster reinstates but the app still cannot use the node for reasons different from the nodes cluster availability - we will still include that node in headroom incorrectly...). I also think that, as suggested in [YARN-2848], the only approach I see working for all states is one where there is a last-change indicator of some sort active for the cluster in terms of it's node composition which is held by the application and, when it has updated past the application's last calculation for "app cluster resource" (in this case, the one which omits blacklisted nodes), it re-evaluates state to determine a new "app cluster resource" which it then uses (until a reevaluation is required, again). This should enable the application to have accurate headroom information regardless of the timing of changes and allows for the more complex evaluations which may be needed (rack blacklisting, etc) while minimizing the frequency of those evaluations. I don't think it is necessarily required for blacklisting, but it's worth noting that this could include offloading some of the calculation to the application master (via more informational api's / library functions for calculation) to distribute the cost outward. Again, not necessarily for this case, but I wanted to mention it as I think it is an option now or later on. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > ------------------------------------------------------------------------------------------------------ > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 > Reporter: Rohith > Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)