[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665043#comment-15665043
 ] 

Jason Lowe commented on YARN-5547:
----------------------------------

Thanks for updating the patch!

Is there a good reason to store the killed state when we aren't going to 
recover a container?  It seems unnecessary to me.  If for some reason we crash 
during recover and try to recover again on the next startup, it will continue 
to not recognize the container and try to kill it again.  Explicitly storing 
the state as killed doesn't seem to accomplish much.  Is there a recovery 
scenario where it's needed?

When the container does finally get killed and is removed from the state store, 
we will leak any keys that are not known by the current software.  The state 
store container removal code only deletes the keys it knows about.  We either 
need to track unknown keys associated with containers or do a scan to remove 
all keys when we delete a container (the latter could be expensive in terms of 
latency).  If we do go with the latter, we only need to do so for any 
containers that were recovered, and it would be nice to avoid the performance 
penalty for containers that don't need it.

> NMLeveldbStateStore should be more tolerant of unknown keys
> -----------------------------------------------------------
>
>                 Key: YARN-5547
>                 URL: https://issues.apache.org/jira/browse/YARN-5547
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Ajith S
>         Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to