[jira] [Commented] (YARN-3998) Add support in the NodeManager to re-launch containers

Jason Lowe (JIRA) Mon, 22 Aug 2016 07:08:36 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430822#comment-15430822
 ]


Jason Lowe commented on YARN-3998:
----------------------------------

I believe the old software will ignore unrecognized keys in the state store, so 
we may be OK with a rolling downgrade as long as the resulting behavior is 
expected.  This feature adds the ability for the NM to re-launch containers, so 
if we downgrade we lose that ability.  That means that containers will just 
fail, but I think we're OK there.  We lose the optimization for a faster 
restart of the container, but AFAIK we don't outright lose containers like we 
could with the feature added in YARN-5049.  One issue is that I'm not sure the 
old software will clean out the new keys when a container completes, so we 
might leak some keys in the state store during a rolling downgrade.

We could bump the minor version although it's unused.  The minor version 
carries no meaning currently, although it could potentially provide clues to 
code that needs to do a schema migration across major versions.  In practice I 
suspect it still won't be used since any such migration will take into account 
all the keys it knows about, and at that point it doesn't need to look at the 
minor version.  I can't think of a case where we would need the minor version 
info to properly implement the migration cases rather than have the migration 
code auto-detect based on what keys it finds in the store.

It doesn't look to me that this needs to be flagged as an incompatible change 
unless I'm missing something with the semantics of the container re-launch.

> Add support in the NodeManager to re-launch containers
> ------------------------------------------------------
>
>                 Key: YARN-3998
>                 URL: https://issues.apache.org/jira/browse/YARN-3998
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>             Fix For: 2.9.0
>
>         Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3998) Add support in the NodeManager to re-launch containers

Reply via email to