[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530915#comment-14530915 ]
Vinod Kumar Vavilapalli commented on YARN-1680: ----------------------------------------------- bq. Further, it seems to me that nodelable specific calculations are very much of the same cloth as this / the same type of problem and cost, and they are in the scheduler. So why not this also? This is the question I was answering before. The Node-labels story is still developing, but we identified two types: partitions and constraints. The code that we have today largely caters to partitions. Partitions have implications on queue capacities etc and so RM making decisions is the right thing to do. OTOH, blacklisting / hard-locality are app-decisions. From the platform's perspective, those nodes, free or otherwise, are actually available for apps to use. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > ------------------------------------------------------------------------------------------------------ > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler > Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 > Reporter: Rohith > Assignee: Craig Welch > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)