[ https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665043#comment-15665043 ]
Jason Lowe commented on YARN-5547: ---------------------------------- Thanks for updating the patch! Is there a good reason to store the killed state when we aren't going to recover a container? It seems unnecessary to me. If for some reason we crash during recover and try to recover again on the next startup, it will continue to not recognize the container and try to kill it again. Explicitly storing the state as killed doesn't seem to accomplish much. Is there a recovery scenario where it's needed? When the container does finally get killed and is removed from the state store, we will leak any keys that are not known by the current software. The state store container removal code only deletes the keys it knows about. We either need to track unknown keys associated with containers or do a scan to remove all keys when we delete a container (the latter could be expensive in terms of latency). If we do go with the latter, we only need to do so for any containers that were recovered, and it would be nice to avoid the performance penalty for containers that don't need it. > NMLeveldbStateStore should be more tolerant of unknown keys > ----------------------------------------------------------- > > Key: YARN-5547 > URL: https://issues.apache.org/jira/browse/YARN-5547 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Ajith S > Attachments: YARN-5547.01.patch, YARN-5547.02.patch, > YARN-5547.03.patch > > > Whenever new keys are added to the NM state store it will break rolling > downgrades because the code will throw if it encounters an unrecognized key. > If instead it skipped unrecognized keys it could be simpler to continue > supporting rolling downgrades. We need to define the semantics of > unrecognized keys when containers and apps are cleaned up, e.g.: we may want > to delete all keys underneath an app or container directory when it is being > removed from the state store to prevent leaking unrecognized keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org