[ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048291#comment-14048291 ]
Jian He commented on YARN-1367: ------------------------------- In that case, RM should still be able to shoot unknown containers. I think the point is that in the future we are only supporting work-preserving restart and the newly added command will be useless at that point. This config is only a temporary solution for testing and stabilizing. > After restart NM should resync with the RM without killing containers > --------------------------------------------------------------------- > > Key: YARN-1367 > URL: https://issues.apache.org/jira/browse/YARN-1367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Anubhav Dhoot > Attachments: YARN-1367.001.patch, YARN-1367.002.patch, > YARN-1367.prototype.patch > > > After RM restart, the RM sends a resync response to NMs that heartbeat to it. > Upon receiving the resync response, the NM kills all containers and > re-registers with the RM. The NM should be changed to not kill the container > and instead inform the RM about all currently running containers including > their allocations etc. After the re-register, the NM should send all pending > container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.2#6252)