Re: Reuse Executor JVM across different JobContext

Gene Pang Tue, 19 Jan 2016 08:22:42 -0800

Yes, you can share RDDs with Tachyon, while keeping the data in memory.
Spark jobs can write to a Tachyon path (tachyon://host:port/path/) and
other jobs can read from the same path.


Here is a presentation that includes that use case:
http://www.slideshare.net/TachyonNexus/tachyon-presentation-at-ampcamp-6-november-2015

Thanks,
Gene

On Sun, Jan 17, 2016 at 1:56 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Yes, that is one of the basic reasons to use a
> jobserver/shared-SparkContext.  Otherwise, in order share the data in an
> RDD you have to use an external storage system, such as a distributed
> filesystem or Tachyon.
>
> On Sun, Jan 17, 2016 at 1:52 PM, Jia <jacqueline...@gmail.com> wrote:
>
>> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem,
>> so that jobs can be submitted at different time and still share RDDs.
>>
>> Best Regards,
>> Jia
>>
>>
>> On Jan 17, 2016, at 3:44 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>> There is a 1-to-1 relationship between Spark Applications and
>> SparkContexts -- fundamentally, a Spark Applications is a program that
>> creates and uses a SparkContext, and that SparkContext is destroyed when
>> then Application ends.  A jobserver generically and the Spark JobServer
>> specifically is an Application that keeps a SparkContext open for a long
>> time and allows many Jobs to be be submitted and run using that shared
>> SparkContext.
>>
>> More than one Application/SparkContext unavoidably implies more than one
>> JVM process per Worker -- Applications/SparkContexts cannot share JVM
>> processes.
>>
>> On Sun, Jan 17, 2016 at 1:15 PM, Jia <jacqueline...@gmail.com> wrote:
>>
>>> Hi, Mark, sorry for the confusion.
>>>
>>> Let me clarify, when an application is submitted, the master will tell
>>> each Spark worker to spawn an executor JVM process. All the task sets  of
>>> the application will be executed by the executor. After the application
>>> runs to completion. The executor process will be killed.
>>> But I hope that all applications submitted can run in the same executor,
>>> can JobServer do that? If so, it’s really good news!
>>>
>>> Best Regards,
>>> Jia
>>>
>>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra <m...@clearstorydata.com>
>>> wrote:
>>>
>>> You've still got me confused.  The SparkContext exists at the Driver,
>>> not on an Executor.
>>>
>>> Many Jobs can be run by a SparkContext -- it is a common pattern to use
>>> something like the Spark Jobserver where all Jobs are run through a shared
>>> SparkContext.
>>>
>>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou <jacqueline...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Mark, sorry, I mean SparkContext.
>>>> I mean to change Spark into running all submitted jobs (SparkContexts)
>>>> in one executor JVM.
>>>>
>>>> Best Regards,
>>>> Jia
>>>>
>>>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra <m...@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> -dev
>>>>>
>>>>> What do you mean by JobContext?  That is a Hadoop mapreduce concept,
>>>>> not Spark.
>>>>>
>>>>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Is there a way to reuse executor JVM across different JobContexts?
>>>>>> Thanks.
>>>>>>
>>>>>> Best Regards,
>>>>>> Jia
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Reuse Executor JVM across different JobContext

Reply via email to