Thanks Vadim & Jörn... I will look into those. jg
> On Jun 20, 2017, at 2:12 PM, Vadim Semenov <[email protected]> > wrote: > > You can launch one permanent spark context and then execute your jobs within > the context. And since they'll be running in the same context, they can share > data easily. > > These two projects provide the functionality that you need: > https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs > > <https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs> > https://github.com/cloudera/livy#post-sessions > <https://github.com/cloudera/livy#post-sessions> > > On Tue, Jun 20, 2017 at 1:46 PM, Jean Georges Perrin <[email protected] > <mailto:[email protected]>> wrote: > Hey, > > Here is my need: program A does something on a set of data and produces > results, program B does that on another set, and finally, program C combines > the data of A and B. Of course, the easy way is to dump all on disk after A > and B are done, but I wanted to avoid this. > > I was thinking of creating a temp view, but I do not really like the temp > aspect of it ;). Any idea (they are all worth sharing) > > jg > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > <mailto:[email protected]> > >
