[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

Sandy Ryza (JIRA) Thu, 03 Oct 2013 16:17:20 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785669#comment-13785669
 ]


Sandy Ryza commented on YARN-1265:
----------------------------------

Uploaded a patch that, instead of the above, changes the Fair Scheduler's 
behavior to mimic the Capacity Scheduler.

> Fair Scheduler chokes on unhealthy node reconnect
> -------------------------------------------------
>
>                 Key: YARN-1265
>                 URL: https://issues.apache.org/jira/browse/YARN-1265
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>    Affects Versions: 2.1.1-beta
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: YARN-1265-1.patch, YARN-1265.patch
>
>
> Only nodes in the RUNNING state are tracked by schedulers.  When a node 
> reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
> it's in the RUNNING state.  The FairScheduler doesn't guard against this.
> I think the best way to fix this is to check to see whether a node is RUNNING 
> before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

Reply via email to