[ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun Suresh updated YARN-5620: ------------------------------ Attachment: YARN-5620.006.patch Uploading patch addressing most of [~vvasudev] and [~jianhe] suggestions. Thanks for the comments !! [~vvasudev], bq. Should there be a guard against calling reint if a reinit is already in progress? Could we end up with the ReInitContext in odd state? So there is already a guard in the ContainerManager api... but I have included an additional check in the transition in the new patch as per your suggestion. bq. Instead of a launch event we should send a relaunch event - the relaunch takes care of trying to run in same work dir as the earlier attempt, etc I actually tried using relaunch initially... but it looks like the pid has to be running for the re launch to work correctly. Also, looks like we would need an intermediate state there too and would result in same (or more) amount of code change. I would actually prefer to use launch itself, since I am more confident of how it works. I have also updated the testcase to verify that the upgraded container has access to and is able to read files created by the previous process in the working directory. bq. think an explicit commit API(with auto-commit option being the default option) should satisfy both use cases. Thanks.. will update the patch with it once we agree that the reinit flow is fine. [~jianhe], bq. While AM issues the upgrade command, the container could exit with success or failure. in this case, should we still continue the upgrade process ? I am nullifying the reInitContext in the event of an explicit kill or if process completed successfully during the reInit.. the upgrade should thus be cancelled. Do take a look at the latest patch and let me know if you think i've cover all cases. > Core changes in NodeManager to support for upgrade and rollback of Containers > ----------------------------------------------------------------------------- > > Key: YARN-5620 > URL: https://issues.apache.org/jira/browse/YARN-5620 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Arun Suresh > Assignee: Arun Suresh > Attachments: YARN-5620.001.patch, YARN-5620.002.patch, > YARN-5620.003.patch, YARN-5620.004.patch, YARN-5620.005.patch, > YARN-5620.006.patch > > > JIRA proposes to modify the ContainerManager (and other core classes) to > support upgrade of a running container with a new {{ContainerLaunchContext}} > as well as the ability to rollback the upgrade if the container is not able > to restart using the new launch Context. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org