Yes, you can share RDDs with Tachyon, while keeping the data in memory. Spark jobs can write to a Tachyon path (tachyon://host:port/path/) and other jobs can read from the same path.
Here is a presentation that includes that use case: http://www.slideshare.net/TachyonNexus/tachyon-presentation-at-ampcamp-6-november-2015 Thanks, Gene On Sun, Jan 17, 2016 at 1:56 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Yes, that is one of the basic reasons to use a > jobserver/shared-SparkContext. Otherwise, in order share the data in an > RDD you have to use an external storage system, such as a distributed > filesystem or Tachyon. > > On Sun, Jan 17, 2016 at 1:52 PM, Jia <jacqueline...@gmail.com> wrote: > >> Thanks, Mark. Then, I guess JobServer can fundamentally solve my problem, >> so that jobs can be submitted at different time and still share RDDs. >> >> Best Regards, >> Jia >> >> >> On Jan 17, 2016, at 3:44 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> >> There is a 1-to-1 relationship between Spark Applications and >> SparkContexts -- fundamentally, a Spark Applications is a program that >> creates and uses a SparkContext, and that SparkContext is destroyed when >> then Application ends. A jobserver generically and the Spark JobServer >> specifically is an Application that keeps a SparkContext open for a long >> time and allows many Jobs to be be submitted and run using that shared >> SparkContext. >> >> More than one Application/SparkContext unavoidably implies more than one >> JVM process per Worker -- Applications/SparkContexts cannot share JVM >> processes. >> >> On Sun, Jan 17, 2016 at 1:15 PM, Jia <jacqueline...@gmail.com> wrote: >> >>> Hi, Mark, sorry for the confusion. >>> >>> Let me clarify, when an application is submitted, the master will tell >>> each Spark worker to spawn an executor JVM process. All the task sets of >>> the application will be executed by the executor. After the application >>> runs to completion. The executor process will be killed. >>> But I hope that all applications submitted can run in the same executor, >>> can JobServer do that? If so, it’s really good news! >>> >>> Best Regards, >>> Jia >>> >>> On Jan 17, 2016, at 3:09 PM, Mark Hamstra <m...@clearstorydata.com> >>> wrote: >>> >>> You've still got me confused. The SparkContext exists at the Driver, >>> not on an Executor. >>> >>> Many Jobs can be run by a SparkContext -- it is a common pattern to use >>> something like the Spark Jobserver where all Jobs are run through a shared >>> SparkContext. >>> >>> On Sun, Jan 17, 2016 at 12:57 PM, Jia Zou <jacqueline...@gmail.com> >>> wrote: >>> >>>> Hi, Mark, sorry, I mean SparkContext. >>>> I mean to change Spark into running all submitted jobs (SparkContexts) >>>> in one executor JVM. >>>> >>>> Best Regards, >>>> Jia >>>> >>>> On Sun, Jan 17, 2016 at 2:21 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> -dev >>>>> >>>>> What do you mean by JobContext? That is a Hadoop mapreduce concept, >>>>> not Spark. >>>>> >>>>> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com> >>>>> wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> Is there a way to reuse executor JVM across different JobContexts? >>>>>> Thanks. >>>>>> >>>>>> Best Regards, >>>>>> Jia >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >