[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

Rohith (JIRA) Mon, 27 Apr 2015 10:34:45 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514499#comment-14514499
 ]


Rohith commented on YARN-3535:
------------------------------

Adding RR back to scheduler makes more sense to me. 

Since RM identifies NM restart enabled or not using running applications that 
reported during registration call, it will be difficult to distinguish between 
NM restart enabled with 0 applications reporting to RM VS NM restart disabled 
where all the time NM restarts reports 0 applications to RM. Why can't NM 
register with additional flag indicating to RM that NM restart is enabled. Any 
thoughts? 
I was created to refactor the code for RMNodeImpl#ReconnectedNodeTransition in 
YARN-3286, but did not progress since it was changing the behavior of killing 
running container on NM restart.

>  ResourceRequest should be restored back to scheduler when RMContainer is 
> killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>         Attachments: syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED

Reply via email to