[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803100#comment-14803100
 ] 

Karthik Kambatla commented on YARN-3446:
----------------------------------------

Thanks for rebasing the patch, [~zxu]. Comments:

FSAppAttempt:
# How about using a helper method {{subtractResourcesOnBlacklistedNodes}} 
instead of adding all the logic to {{getHeadroom}} itself?
# Is the optimization to get the blacklist only when it has changed necessary? 
Looks like we optimize the fetch, but not the iteration on it. I think we 
should either go all the way and optimize iterating on the blacklist nodes as 
well only when the blacklist has changed, or leave out the optimization until 
we see a need for it. 
# To get the blacklist, can't we just use {{AppSchedulingInfo#getBlacklist}} 
(needs synchronization) or {{AppSchedulingInfo#getBlacklistCopy}}? Do we need 
the methods in the scheduler? 

If we make these changes, we might not need all the changes in rest of the 
files.


> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -------------------------------------------------------------------------
>
>                 Key: YARN-3446
>                 URL: https://issues.apache.org/jira/browse/YARN-3446
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to