Thanks! Is there an existing JIRA I should watch? ~ Jonathan
From: Sandy Ryza <sandy.r...@cloudera.com<mailto:sandy.r...@cloudera.com>> Date: Wednesday, July 15, 2015 at 2:27 PM To: Jonathan Kelly <jonat...@amazon.com<mailto:jonat...@amazon.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf Hi Jonathan, This is a problem that has come up for us as well, because we'd like dynamic allocation to be turned on by default in some setups, but not break existing users with these properties. I'm hoping to figure out a way to reconcile these by Spark 1.5. -Sandy On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan <jonat...@amazon.com<mailto:jonat...@amazon.com>> wrote: Would there be any problem in having spark.executor.instances (or --num-executors) be completely ignored (i.e., even for non-zero values) if spark.dynamicAllocation.enabled is true (i.e., rather than throwing an exception)? I can see how the exception would be helpful if, say, you tried to pass both "-c spark.executor.instances" (or --num-executors) *and* "-c spark.dynamicAllocation.enabled=true" to spark-submit on the command line (as opposed to having one of them in spark-defaults.conf and one of them in the spark-submit args), but currently there doesn't seem to be any way to distinguish between arguments that were actually passed to spark-submit and settings that simply came from spark-defaults.conf. If there were a way to distinguish them, I think the ideal situation would be for the validation exception to be thrown only if spark.executor.instances and spark.dynamicAllocation.enabled=true were both passed via spark-submit args or were both present in spark-defaults.conf, but passing spark.dynamicAllocation.enabled=true to spark-submit would take precedence over spark.executor.instances configured in spark-defaults.conf, and vice versa. Jonathan Kelly Elastic MapReduce - SDE Blackfoot (SEA33) 06.850.F0 From: Jonathan Kelly <jonat...@amazon.com<mailto:jonat...@amazon.com>> Date: Tuesday, July 14, 2015 at 4:23 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf I've set up my cluster with a pre-calcualted value for spark.executor.instances in spark-defaults.conf such that I can run a job and have it maximize the utilization of the cluster resources by default. However, if I want to run a job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true to spark-submit), I get this exception: Exception in thread "main" java.lang.IllegalArgumentException: Explicitly setting the number of executors is not compatible with spark.dynamicAllocation.enabled! at org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192) at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:59) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54) … The exception makes sense, of course, but ideally I would like it to ignore what I've put in spark-defaults.conf for spark.executor.instances if I've enabled dynamicAllocation. The most annoying thing about this is that if I have spark.executor.instances present in spark-defaults.conf, I cannot figure out any way to spark-submit a job with spark.dynamicAllocation.enabled=true without getting this error. That is, even if I pass "-c spark.executor.instances=0 -c spark.dynamicAllocation.enabled=true", I still get this error because the validation in ClientArguments.parseArgs() that's checking for this condition simply checks for the presence of spark.executor.instances rather than whether or not its value is > 0. Should the check be changed to allow spark.executor.instances to be set to 0 if spark.dynamicAllocation.enabled is true? That would be an OK compromise, but I'd really prefer to be able to enable dynamicAllocation simply by setting spark.dynamicAllocation.enabled=true rather than by also having to set spark.executor.instances to 0. Thanks, Jonathan