Hi,

Michael's answer will solve the problem in case you using only SQL based
solution.

Otherwise please refer to the wonderful details mentioned here
https://spark.apache.org/docs/latest/job-scheduling.html. With EMR 5.3.0
released  SPARK 2.1.0 is available in AWS.

(note that there is an issue with using zeppelin in it and I have raised it
as an issue to AWS and they are looking into it now)

Regards,
Gourav Sengupta

On Tue, Feb 7, 2017 at 10:37 PM, Michael Segel <msegel_had...@hotmail.com>
wrote:

> Why couldn’t you use the spark thrift server?
>
>
> On Feb 7, 2017, at 1:28 PM, Cosmin Posteuca <cosmin.poste...@gmail.com>
> wrote:
>
> answer for Gourav Sengupta
>
> I want to use same spark application because i want to work as a FIFO
> scheduler. My problem is that i have many jobs(not so big) and if i run an
> application for every job my cluster will split resources as a FAIR
> scheduler(it's what i observe, maybe i'm wrong) and exist the possibility
> to create bottleneck effect. The start time isn't a problem for me, because
> it isn't a real-time application.
>
> I need a business solution, that's the reason why i can't use code from
> github.
>
> Thanks!
>
> 2017-02-07 19:55 GMT+02:00 Gourav Sengupta <gourav.sengu...@gmail.com>:
>
>> Hi,
>>
>> May I ask the reason for using the same spark application? Is it because
>> of the time it takes in order to start a spark context?
>>
>> On another note you may want to look at the number of contributors in a
>> github repo before choosing a solution.
>>
>>
>> Regards,
>> Gourav
>>
>> On Tue, Feb 7, 2017 at 5:26 PM, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> Spark jobserver or Livy server are the best options for pure technical
>>> API.
>>> If you want to publish business API you will probably have to build you
>>> own app like the one I wrote a year ago https://github.com/elppc/akka-
>>> spark-experiments
>>> It combines Akka actors and a shared Spark context to serve concurrent
>>> subsecond jobs
>>>
>>>
>>> 2017-02-07 15:28 GMT+01:00 ayan guha <guha.a...@gmail.com>:
>>>
>>>> I think you are loking for livy or spark  jobserver
>>>>
>>>> On Wed, 8 Feb 2017 at 12:37 am, Cosmin Posteuca <
>>>> cosmin.poste...@gmail.com> wrote:
>>>>
>>>>> I want to run different jobs on demand with same spark context, but i
>>>>> don't know how exactly i can do this.
>>>>>
>>>>> I try to get current context, but seems it create a new spark
>>>>> context(with new executors).
>>>>>
>>>>> I call spark-submit to add new jobs.
>>>>>
>>>>> I run code on Amazon EMR(3 instances, 4 core & 16GB ram / instance),
>>>>> with yarn as resource manager.
>>>>>
>>>>> My code:
>>>>>
>>>>> val sparkContext = SparkContext.getOrCreate()
>>>>> val content = 1 to 40000
>>>>> val result = sparkContext.parallelize(content, 5)
>>>>> result.map(value => value.toString).foreach(loop)
>>>>>
>>>>> def loop(x: String): Unit = {
>>>>>    for (a <- 1 to 30000000) {
>>>>>
>>>>>    }
>>>>> }
>>>>>
>>>>> spark-submit:
>>>>>
>>>>> spark-submit --executor-cores 1 \
>>>>>              --executor-memory 1g \
>>>>>              --driver-memory 1g \
>>>>>              --master yarn \
>>>>>              --deploy-mode cluster \
>>>>>              --conf spark.dynamicAllocation.enabled=true \
>>>>>              --conf spark.shuffle.service.enabled=true \
>>>>>              --conf spark.dynamicAllocation.minExecutors=1 \
>>>>>              --conf spark.dynamicAllocation.maxExecutors=3 \
>>>>>              --conf spark.dynamicAllocation.initialExecutors=3 \
>>>>>              --conf spark.executor.instances=3 \
>>>>>
>>>>> If i run twice spark-submit it create 6 executors, but i want to run
>>>>> all this jobs on same spark application.
>>>>>
>>>>> How can achieve adding jobs to an existing spark application?
>>>>>
>>>>> I don't understand why SparkContext.getOrCreate() don't get existing
>>>>> spark context.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Cosmin P.
>>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>
>>>
>>
>
>

Reply via email to