Wangda Tan created YARN-8080: -------------------------------- Summary: YARN native service should support component restart policy Key: YARN-8080 URL: https://issues.apache.org/jira/browse/YARN-8080 Project: Hadoop YARN Issue Type: Task Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-8080.001.patch
Existing native service assumes the service is long running and never finishes. Containers will be restarted even if exit code == 0. To support boarder use cases, we need to allow restart policy of component specified by users. Propose to have following policies: 1) Always: containers always restarted by framework regardless of container exit status. This is existing/default behavior. 2) Never: Do not restart containers in any cases after container finishes: To support job-like workload (for example Tensorflow training job). If a task exit with code == 0, we should not restart the task. This can be used by services which is not restart/recovery-able. 3) On-failure: Similar to above, only restart task with exitcode != 0. Behaviors after component *instance* finalize (Succeeded or Failed when restart_policy != ALWAYS): 1) For single component, single instance: complete service. 2) For single component, multiple instance: other running instances from the same component won't be affected by the finalized component instance. Service will be terminated once all instances finalized. 3) For multiple components: Service will be terminated once all components finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org