Marcelo Vanzin created SPARK-25920: -------------------------------------- Summary: Avoid custom processing of CLI options for cluster submission Key: SPARK-25920 URL: https://issues.apache.org/jira/browse/SPARK-25920 Project: Spark Issue Type: Improvement Components: Spark Submit Affects Versions: 3.0.0 Reporter: Marcelo Vanzin
In {{SparkSubmit}}, when an app is being submitted in cluster mode, there is currently a lot of code specific to each resource manager to take the {{SparkSubmit}} internals, package them up in a rm-specific set of "command line options", and parse them back into memory when the rm-specific class is invoked. e.g. for YARN {code} // In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) { childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) { childArgs += ("--primary-py-file", args.primaryResource) childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") [blah blah blah] {code} For Mesos: {code} if (isMesosCluster) { assert(args.useRest, "Mesos cluster mode is only supported through the REST submission API") childMainClass = REST_CLUSTER_SUBMIT_CLASS if (args.isPython) { // Second argument is main class childArgs += (args.primaryResource, "") if (args.pyFiles != null) { sparkConf.set("spark.submit.pyFiles", args.pyFiles) } [blah blah blah] {code} For k8s: {code} if (isKubernetesCluster) { childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS if (args.primaryResource != SparkLauncher.NO_RESOURCE) { if (args.isPython) { childArgs ++= Array("--primary-py-file", args.primaryResource) childArgs ++= Array("--main-class", "org.apache.spark.deploy.PythonRunner") [blah blah blah] {code} These parts of the code are all very similar and there's not a good reason for why each RM needs specific processing here. We should try to simplify all this stuff and pass pre-parsed command line options to the cluster submission classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org