Hi developers, I've encountered some problem with Spark, and before opening an 
issue, I'd like to hear your thoughts.


Currently, if you want to submit a Spark job, you'll need to write the code, 
make a jar, and then submit it with spark-submit or 
org.apache.spark.launcher.SparkLauncher. 


But sometimes, the RDD operation chain is transferred dynamically in code, from 
SQL or even GUI. thus it seems either inconvenient or not possible to make a 
separated jar. Then I tried something like below:
val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")val sc = 
new SparkContext(conf)sc.textFile("README.md").flatMap(_.split(" ")).map((_, 
1)).reduceByKey(_+_).foreach(println) // A simple word countWhen they are 
executed, a Spark job is submitted. However, there are some remaining problems:
1. It doesn't support all deploy modes, such as yarn-cluster.
2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice.
3. It runs within the same process with my code, no child process is created.



Thus, what I wish for is that the problems can be handle by Spark itself, and 
my request can be simply described as a "adding submit() method for 
SparkContext / StreamingContext / SQLContext". I hope if I added a line after 
the code above like this:
sc.submit()then Spark can handle all background submitting processing for me.

I already opened an issue before for this demand, but I couldn't make myself 
clear back then. So I wrote this email and try to talk to you guys. Please 
reply if you need further descriptions, and I'll open a issue for this if you 
understand my demand and believe that it's something worth doing.


Thanks a lot.


Yuhang Chen.

yuhang.c...@foxmail.com

Reply via email to