[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514499#comment-14514499 ]
Rohith commented on YARN-3535: ------------------------------ Adding RR back to scheduler makes more sense to me. Since RM identifies NM restart enabled or not using running applications that reported during registration call, it will be difficult to distinguish between NM restart enabled with 0 applications reporting to RM VS NM restart disabled where all the time NM restarts reports 0 applications to RM. Why can't NM register with additional flag indicating to RM that NM restart is enabled. Any thoughts? I was created to refactor the code for RMNodeImpl#ReconnectedNodeTransition in YARN-3286, but did not progress since it was changing the behavior of killing running container on NM restart. > ResourceRequest should be restored back to scheduler when RMContainer is > killed at ALLOCATED > --------------------------------------------------------------------------------------------- > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Peng Zhang > Assignee: Peng Zhang > Attachments: syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)