[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533632#comment-14533632 ]
Wangda Tan commented on YARN-1680: ---------------------------------- Had some offline discussion with [~jianhe] and [~cwelch]. Some takings from my side: - If we want to get accurate headroom for application which has blacklisted nodes, it looks unavoidable to get sum(app.blacklisted_nodes.avail) while calculation headroom for the app. This requires when a node doing heartbeat with changed available resource, all apps blacklisted the node need to be notified, when there're lots of application blacklisted large amount of nodes, performance regression could happen. - If we consider sum(app.blacklisted_nodes.total) instead of considering sum(app.blacklisted_nodes.avail), headroom for app could be under estimated, this could lead to app with blacklisted nodes always receive 0 headroom when a large cluster with highly resource utilization (like >99%). Some fallbacks strategies: # Only do accurate headroom calculation when there're not too much blacklisted nodes as well as apps with blacklisted nodes. # Tolerance under estimation of headroom Some alternatives: - MAPREDUCE-6302 is targeting to preempt reducer even if we reported inaccurate headroom for apps. I think the approach looks good to me. - Move headroom calculation to application side, I think now we cannot do it at least for now. Application will only receive updated NodeReport from when node changes heathy status instead of regular heartbeat. We cannot send so much data to AM during heartbeat. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > ------------------------------------------------------------------------------------------------------ > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler > Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 > Reporter: Rohith > Assignee: Craig Welch > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)