[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048208#comment-14048208
 ] 

Anubhav Dhoot commented on YARN-1367:
-------------------------------------

I had it that way but after discussion it seemed like depending on config might 
make it cumbersome. I am worried about what happens when we have a mismatch 
between RM and NM. For example if NM does not kill containers (setting on) and 
RM is not expecting containers to be preserved (Setting off). Then the 
containers could be running without RM accounting for them.

> After restart NM should resync with the RM without killing containers
> ---------------------------------------------------------------------
>
>                 Key: YARN-1367
>                 URL: https://issues.apache.org/jira/browse/YARN-1367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
> YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to