answer for Gourav Sengupta I want to use same spark application because i want to work as a FIFO scheduler. My problem is that i have many jobs(not so big) and if i run an application for every job my cluster will split resources as a FAIR scheduler(it's what i observe, maybe i'm wrong) and exist the possibility to create bottleneck effect. The start time isn't a problem for me, because it isn't a real-time application.
I need a business solution, that's the reason why i can't use code from github. Thanks! 2017-02-07 19:55 GMT+02:00 Gourav Sengupta <gourav.sengu...@gmail.com>: > Hi, > > May I ask the reason for using the same spark application? Is it because > of the time it takes in order to start a spark context? > > On another note you may want to look at the number of contributors in a > github repo before choosing a solution. > > > Regards, > Gourav > > On Tue, Feb 7, 2017 at 5:26 PM, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> Spark jobserver or Livy server are the best options for pure technical >> API. >> If you want to publish business API you will probably have to build you >> own app like the one I wrote a year ago https://github.com/elppc/akka- >> spark-experiments >> It combines Akka actors and a shared Spark context to serve concurrent >> subsecond jobs >> >> >> 2017-02-07 15:28 GMT+01:00 ayan guha <guha.a...@gmail.com>: >> >>> I think you are loking for livy or spark jobserver >>> >>> On Wed, 8 Feb 2017 at 12:37 am, Cosmin Posteuca < >>> cosmin.poste...@gmail.com> wrote: >>> >>>> I want to run different jobs on demand with same spark context, but i >>>> don't know how exactly i can do this. >>>> >>>> I try to get current context, but seems it create a new spark >>>> context(with new executors). >>>> >>>> I call spark-submit to add new jobs. >>>> >>>> I run code on Amazon EMR(3 instances, 4 core & 16GB ram / instance), >>>> with yarn as resource manager. >>>> >>>> My code: >>>> >>>> val sparkContext = SparkContext.getOrCreate() >>>> val content = 1 to 40000 >>>> val result = sparkContext.parallelize(content, 5) >>>> result.map(value => value.toString).foreach(loop) >>>> >>>> def loop(x: String): Unit = { >>>> for (a <- 1 to 30000000) { >>>> >>>> } >>>> } >>>> >>>> spark-submit: >>>> >>>> spark-submit --executor-cores 1 \ >>>> --executor-memory 1g \ >>>> --driver-memory 1g \ >>>> --master yarn \ >>>> --deploy-mode cluster \ >>>> --conf spark.dynamicAllocation.enabled=true \ >>>> --conf spark.shuffle.service.enabled=true \ >>>> --conf spark.dynamicAllocation.minExecutors=1 \ >>>> --conf spark.dynamicAllocation.maxExecutors=3 \ >>>> --conf spark.dynamicAllocation.initialExecutors=3 \ >>>> --conf spark.executor.instances=3 \ >>>> >>>> If i run twice spark-submit it create 6 executors, but i want to run >>>> all this jobs on same spark application. >>>> >>>> How can achieve adding jobs to an existing spark application? >>>> >>>> I don't understand why SparkContext.getOrCreate() don't get existing >>>> spark context. >>>> >>>> >>>> Thanks, >>>> >>>> Cosmin P. >>>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> >