[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529224#comment-14529224 ]
Craig Welch commented on YARN-1680: ----------------------------------- I've been looking over [~airbots] prior patches, the discussion, etc, & this was what I was going to suggest as an approach. As I mentioned before, I think that accuracy will unfortunately require holding on to the blacklist in the scheduler app, I think this is OK because these should be relatively small, but it is still a drawback. We could impose a limit to size as a mitigating factor, but that could affect accuracy in some cases as well. In any event, this is the approach I'm suggesting: Retain a node/rack blacklist in the scheduler application based on addition/removals from the application master Add a "last change" timestamp or incrementing counter to track node addition/removal at the cluster level (which is what exists for "cluster black/white" listing afaict), updated when those events occur Add a "last change" timestamp/counter to the application to track blacklist changes have "last updated" values on the application to track the above two "last change" values, updated when blacklist values are recalculated On headroom calculation, the app checks if it has any entries in the blacklist or if it has a "blacklist deduction" value in it's resourceusage entry (see below), to determine if blacklist must be taken into account if blacklist must be taken into account, check the "last updated" values for both cluster and app blacklist changes, if and only if either is stale (last updated != last change) then recalculate the blacklist deduction when calculating the blacklist deduction use [~airbots] basic logic from existing patches. Place the deduction value into a new enumeration index type in ResourceUsage. NodeLables could be taken into account as well, there is some logic about "label(s) of interest" on the application, in addition to a "no label" value which is generally applicable, a value for the "label(s) of interest" could be generated whenever the headroom is handed out by the provider, add a step which applies the proper blacklist deduction if present Thoughts on the approach? > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > ------------------------------------------------------------------------------------------------------ > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler > Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 > Reporter: Rohith > Assignee: Craig Welch > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)