[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

Eric Yang (JIRA) Tue, 15 May 2018 16:44:29 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476621#comment-16476621
 ]


Eric Yang commented on YARN-8080:
---------------------------------

[~suma.shivaprasad] Thank you for the patch, a few nitpicks:

{code}
      if (!failedComponents.isEmpty()) {
        setGracefulStop(FinalApplicationStatus.FAILED);
        getTerminationHandler().terminate(-1);
      } else{
        setGracefulStop(FinalApplicationStatus.SUCCEEDED);
        getTerminationHandler().terminate(0);
      }
{code}

This can be rewritten without the double negatives and use exit code 
declaration to improve code readability:

{code}
      if (failedComponents.isEmpty()) {
        setGracefulStop(FinalApplicationStatus.SUCCEEDED);
        getTerminationHandler().terminate(EXIT_SUCCESS);
      } else{
        setGracefulStop(FinalApplicationStatus.FAILED);
        getTerminationHandler().terminate(EXIT_FALSE);
      }
{code}

Changes is unnecessary:
{code}
@@ -415,7 +451,7 @@ private static ComponentState checkIfStable(Component 
component) {
       component.componentSpec.setState(
           org.apache.hadoop.yarn.service.api.records.ComponentState.FLEXING);
       return FLEXING;
-    } else {
+    } else{
       //  component.numContainersThatNeedUpgrade.get() > 0
       component.componentSpec.setState(org.apache.hadoop.yarn.service.api.
           records.ComponentState.NEEDS_UPGRADE);
{code}

In stabilizeComponents, the containerNum is now a global number for all 
components, instead of component specific number.  What is the reason for this 
change?

It would be great to cleanup the checkstyle errors to make the patch easier to 
read.

restart_policy=ON_FAILURE, and each component instance failed 3 times, and 
application goes into FINISHED state instead of FAILED state.  Is this expected?

> YARN native service should support component restart policy
> -----------------------------------------------------------
>
>                 Key: YARN-8080
>                 URL: https://issues.apache.org/jira/browse/YARN-8080
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Wangda Tan
>            Assignee: Suma Shivaprasad
>            Priority: Critical
>         Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

Reply via email to