[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandy Ryza updated YARN-1265: ----------------------------- Attachment: YARN-1265.patch > Fair Scheduler chokes on unhealthy node reconnect > ------------------------------------------------- > > Key: YARN-1265 > URL: https://issues.apache.org/jira/browse/YARN-1265 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler > Affects Versions: 2.1.1-beta > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Attachments: YARN-1265.patch > > > Only nodes in the RUNNING state are tracked by schedulers. When a node > reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if > it's in the RUNNING state. The FairScheduler doesn't guard against this. > I think the best way to fix this is to check to see whether a node is RUNNING > before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)