[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649347#comment-14649347 ]
Jason Lowe commented on YARN-3998: ---------------------------------- I think it's more effort for YARN to support this than it is for apps to do this on their own. For apps this can be a simple shell script placed in front of the normal container launch command, and that's very simple and straightforward to do. Also if the app controls this directly then it is free to do all sorts of interesting, app-specific retry scenarios, e.g.: only retry on certain exit codes but not others, only retry if certain files have or have not been created in the local filesystem, etc. However I may be in the minority on this, and it would be interesting to hear what others think. > Add retry-times to let NM re-launch container when it fails to run > ------------------------------------------------------------------ > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Jun Gong > Assignee: Jun Gong > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)