Hi developers, I've encountered some problem with Spark, and before opening an issue, I'd like to hear your thoughts.
Currently, if you want to submit a Spark job, you'll need to write the code, make a jar, and then submit it with spark-submit or org.apache.spark.launcher.SparkLauncher. But sometimes, the RDD operation chain is transferred dynamically in code, from SQL or even GUI. thus it seems either inconvenient or not possible to make a separated jar. Then I tried something like below: val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")val sc = new SparkContext(conf)sc.textFile("README.md").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreach(println) // A simple word countWhen they are executed, a Spark job is submitted. However, there are some remaining problems: 1. It doesn't support all deploy modes, such as yarn-cluster. 2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice. 3. It runs within the same process with my code, no child process is created. Thus, what I wish for is that the problems can be handle by Spark itself, and my request can be simply described as a "adding submit() method for SparkContext / StreamingContext / SQLContext". I hope if I added a line after the code above like this: sc.submit()then Spark can handle all background submitting processing for me. I already opened an issue before for this demand, but I couldn't make myself clear back then. So I wrote this email and try to talk to you guys. Please reply if you need further descriptions, and I'll open a issue for this if you understand my demand and believe that it's something worth doing. Thanks a lot. Yuhang Chen. yuhang.c...@foxmail.com