No, the data is not stored between two jobs. But it is stored for a lifetime of a job. Job can have multiple actions run. For a matter of sharing an rdd between jobs you can have a look at Spark Job Server(spark-jobserver <https://github.com/ooyala/spark-jobserver>) or some In-Memory storages: Tachyon(http://tachyon-project.org/) or Ignite( https://ignite.incubator.apache.org/)
2015-08-18 9:37 GMT+02:00 Hemant Bhanawat <hemant9...@gmail.com>: > It is still in memory for future rdd transformations and actions. > > This is interesting. You mean Spark holds the data in memory between two > job executions. How does the second job get the handle of the data in > memory? I am interested in knowing more about it. Can you forward me a > spark article or JIRA that talks about it? > > On Tue, Aug 18, 2015 at 12:49 PM, Sabarish Sasidharan < > sabarish.sasidha...@manthan.com> wrote: > >> It is still in memory for future rdd transformations and actions. What >> you get in driver is a copy of the data. >> >> Regards >> Sab >> >> On Tue, Aug 18, 2015 at 12:02 PM, praveen S <mylogi...@gmail.com> wrote: >> >>> When I do an rdd.collect().. The data moves back to driver Or is still >>> held in memory across the executors? >>> >> >> >> >> -- >> >> Architect - Big Data >> Ph: +91 99805 99458 >> >> Manthan Systems | *Company of the year - Analytics (2014 Frost and >> Sullivan India ICT)* >> +++ >> > >