[ 
https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5620:
------------------------------
    Attachment: YARN-5620.006.patch

Uploading patch addressing most of [~vvasudev] and [~jianhe] suggestions. 
Thanks for the comments !!

[~vvasudev],

bq. Should there be a guard against calling reint if a reinit is already in 
progress? Could we end up with the ReInitContext in odd state?
So there is already a guard in the ContainerManager api... but I have included 
an additional check in the transition in the new patch as per your suggestion.

bq. Instead of a launch event we should send a relaunch event - the relaunch 
takes care of trying to run in same work dir as the earlier attempt, etc
I actually tried using relaunch initially... but it looks like the pid has to 
be running for the re launch to work correctly. Also, looks like we would need 
an intermediate state there too and would result in same (or more) amount of 
code change. I would actually prefer to use launch itself, since I am more 
confident of how it works. I have also updated the testcase to verify that the 
upgraded container has access to and is able to read files created by the 
previous process in the working directory.

bq.  think an explicit commit API(with auto-commit option being the default 
option) should satisfy both use cases.
Thanks.. will update the patch with it once we agree that the reinit flow is 
fine.

[~jianhe],

bq. While AM issues the upgrade command, the container could exit with success 
or failure. in this case, should we still continue the upgrade process ?
I am nullifying the reInitContext in the event of an explicit kill or if 
process completed successfully during the reInit.. the upgrade should thus be 
cancelled. Do take a look at the latest patch and let me know if you think i've 
cover all cases.
 

> Core changes in NodeManager to support for upgrade and rollback of Containers
> -----------------------------------------------------------------------------
>
>                 Key: YARN-5620
>                 URL: https://issues.apache.org/jira/browse/YARN-5620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5620.001.patch, YARN-5620.002.patch, 
> YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, 
> YARN-5620.006.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to 
> support upgrade of a running container with a new {{ContainerLaunchContext}} 
> as well as the ability to rollback the upgrade if the container is not able 
> to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to