[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466462#comment-16466462
 ] 

Wangda Tan commented on YARN-8255:
----------------------------------

Thanks [~suma.shivaprasad] for filing the JIRA and suggestions from [~eyang] / 
[~billie.rinaldi], 

I think the service flexing is different from restart policy: As mentioned by 
[~eyang], restart policy = on_failure / always means some part of the job can 
be *recomputed*. *Recomputable* is different from *Expandable*, an example is 
map-reduce, # of mappers and reducers are determined by InputFormat, which is 
determined before job get launched. Allocating more mappers or reducers than 
pre-calculated while job is running doesn't helpful. Many computation 
frameworks are in this pattern, such as Tensorflow/OpenMPI, etc. adding tasks 
while job is running isn't helpful.

Considering this, I would prefer what Suma suggested, allow user to specify 
allow_flexing, sometimes adding a new instance to a component could lead task 
or even master failure because it is unexpected. I tend to agree making 
allow_flexing=false by default, but I'm also fine with the opposite.

> Allow option to disable flex for a service component 
> -----------------------------------------------------
>
>                 Key: YARN-8255
>                 URL: https://issues.apache.org/jira/browse/YARN-8255
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>            Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to