I've set up my cluster with a pre-calcualted value for spark.executor.instances in spark-defaults.conf such that I can run a job and have it maximize the utilization of the cluster resources by default. However, if I want to run a job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true to spark-submit), I get this exception:
Exception in thread "main" java.lang.IllegalArgumentException: Explicitly setting the number of executors is not compatible with spark.dynamicAllocation.enabled! at org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192) at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:59) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54) ... The exception makes sense, of course, but ideally I would like it to ignore what I've put in spark-defaults.conf for spark.executor.instances if I've enabled dynamicAllocation. The most annoying thing about this is that if I have spark.executor.instances present in spark-defaults.conf, I cannot figure out any way to spark-submit a job with spark.dynamicAllocation.enabled=true without getting this error. That is, even if I pass "-c spark.executor.instances=0 -c spark.dynamicAllocation.enabled=true", I still get this error because the validation in ClientArguments.parseArgs() that's checking for this condition simply checks for the presence of spark.executor.instances rather than whether or not its value is > 0. Should the check be changed to allow spark.executor.instances to be set to 0 if spark.dynamicAllocation.enabled is true? That would be an OK compromise, but I'd really prefer to be able to enable dynamicAllocation simply by setting spark.dynamicAllocation.enabled=true rather than by also having to set spark.executor.instances to 0. Thanks, Jonathan