[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785669#comment-13785669 ]
Sandy Ryza commented on YARN-1265: ---------------------------------- Uploaded a patch that, instead of the above, changes the Fair Scheduler's behavior to mimic the Capacity Scheduler. > Fair Scheduler chokes on unhealthy node reconnect > ------------------------------------------------- > > Key: YARN-1265 > URL: https://issues.apache.org/jira/browse/YARN-1265 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler > Affects Versions: 2.1.1-beta > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Attachments: YARN-1265-1.patch, YARN-1265.patch > > > Only nodes in the RUNNING state are tracked by schedulers. When a node > reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if > it's in the RUNNING state. The FairScheduler doesn't guard against this. > I think the best way to fix this is to check to see whether a node is RUNNING > before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)