[ https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466462#comment-16466462 ]
Wangda Tan commented on YARN-8255: ---------------------------------- Thanks [~suma.shivaprasad] for filing the JIRA and suggestions from [~eyang] / [~billie.rinaldi], I think the service flexing is different from restart policy: As mentioned by [~eyang], restart policy = on_failure / always means some part of the job can be *recomputed*. *Recomputable* is different from *Expandable*, an example is map-reduce, # of mappers and reducers are determined by InputFormat, which is determined before job get launched. Allocating more mappers or reducers than pre-calculated while job is running doesn't helpful. Many computation frameworks are in this pattern, such as Tensorflow/OpenMPI, etc. adding tasks while job is running isn't helpful. Considering this, I would prefer what Suma suggested, allow user to specify allow_flexing, sometimes adding a new instance to a component could lead task or even master failure because it is unexpected. I tend to agree making allow_flexing=false by default, but I'm also fine with the opposite. > Allow option to disable flex for a service component > ----------------------------------------------------- > > Key: YARN-8255 > URL: https://issues.apache.org/jira/browse/YARN-8255 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Reporter: Suma Shivaprasad > Assignee: Suma Shivaprasad > Priority: Major > > YARN-8080 implements restart capabilities for service component instances. > YARN service components should add an option to disallow flexing to support > workloads which are essentially batch/iterative jobs which terminate with > restart_policy=NEVER/ON_FAILURE. This could be disabled by default for > components where restart_policy=NEVER/ON_FAILURE and enabled by default when > restart_policy=ALWAYS(which is the default restart_policy) unless explicitly > set at the service spec. > The option could be exposed as part of the component spec as "allow_flexing". > cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org