[ https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun Suresh updated YARN-4876: ------------------------------ Attachment: YARN-4876.004.patch Thanks for the thoughtful review [~vvasudev]. Apologize for the delay in getting to it. Updating patch to address some of your comments. bq. 18)... Any reason for changing the cleanupContainer function? If storeAsCompleted is false, we should just not call cleanupContainer and avoid modifying the implementation of cleanupContainer? Actually this is required. The cleanup can be called during a restart (container is cleanedup and restarted) and plain old restart as well. bq. 17)... think this has to changed significantly - we can't delete everything in the container work directory. We should clean up the resources that were localized by the container. Hmmm.. we were thinking that if a container is stopped and restarted, its working director should be deleted before restart. Maybe we should probably move all contents to a versioned folder. Thoughts ? bq. 16)... Can you explain why we shouldn't try to recover launched containers? For the prototype, we thought we should just not deal with recovery. But it looks like we can just remove this check and everything should just be fine. [~marco.rabozzi].. Thoughts ? bq. 15)... Are we using destoryDelay as a proxy for whether restart is being allowed? Actually think of it this way. All containers are restartable, but it is not possible to reStart a container with destroyDelay == 0. So technically it isn't a proxy.. it is the only requirement. bq. 14)... The second transition looks wrong. Actually that second transition used to exist prior to this patch. Only the first one was added. bq. 6)... This looks like it might be a race condition. Should we synchronize on startedContainer before checking if it's null? So I have removed the synchronized block all together. It is fine if multiple threads try calling {{destroyContainer}} at the same time. It is an end state for the container and the container will be removed only once from the collection. I agree with the rest of your comments and have made required changes. > [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop > ------------------------------------------------------------------ > > Key: YARN-4876 > URL: https://issues.apache.org/jira/browse/YARN-4876 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Arun Suresh > Assignee: Marco Rabozzi > Attachments: YARN-4876-design-doc.pdf, YARN-4876.002.patch, > YARN-4876.003.patch, YARN-4876.004.patch, YARN-4876.01.patch > > > Introduce *initialize* and *destroy* container API into the > *ContainerManagementProtocol* and decouple the actual start of a container > from the initialization. This will allow AMs to re-start a container without > having to lose the allocation. > Additionally, if the localization of the container is associated to the > initialize (and the cleanup with the destroy), This can also be used by > applications to upgrade a Container by *re-initializing* with a new > *ContainerLaunchContext* -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org