[ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4876:
------------------------------
    Attachment: YARN-4876.004.patch

Thanks for the thoughtful review [~vvasudev]. Apologize for the delay in 
getting to it.

Updating patch to address some of your comments. 

bq. 18)... Any reason for changing the cleanupContainer function? If 
storeAsCompleted is false, we should just not call cleanupContainer and avoid 
modifying the implementation of cleanupContainer?
Actually this is required. The cleanup can be called during a restart 
(container is cleanedup and restarted) and plain old restart as well.

bq. 17)...  think this has to changed significantly - we can't delete 
everything in the container work directory. We should clean up the resources 
that were localized by the container.
Hmmm.. we were thinking that if a container is stopped and restarted, its 
working director should be deleted before restart. Maybe we should probably 
move all contents to a versioned folder. Thoughts ?

bq. 16)... Can you explain why we shouldn't try to recover launched containers?
For the prototype, we thought we should just not deal with recovery. But it 
looks like we can just remove this check and everything should just be fine. 
[~marco.rabozzi].. Thoughts ?

bq. 15)... Are we using destoryDelay as a proxy for whether restart is being 
allowed?
Actually think of it this way. All containers are restartable, but it is not 
possible to reStart a container with destroyDelay == 0. So technically it isn't 
a proxy.. it is the only requirement.

bq. 14)... The second transition looks wrong.
Actually that second transition used to exist prior to this patch. Only the 
first one was added.

bq. 6)... This looks like it might be a race condition. Should we synchronize 
on startedContainer before checking if it's null?
So I have removed the synchronized block all together. It is fine if multiple 
threads try calling {{destroyContainer}} at the same time. It is an end state 
for the container and the container will be removed only once from the 
collection.

I agree with the rest of your comments and have made required changes.



> [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
> ------------------------------------------------------------------
>
>                 Key: YARN-4876
>                 URL: https://issues.apache.org/jira/browse/YARN-4876
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Marco Rabozzi
>         Attachments: YARN-4876-design-doc.pdf, YARN-4876.002.patch, 
> YARN-4876.003.patch, YARN-4876.004.patch, YARN-4876.01.patch
>
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* and decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to