Hi, Alluxio will allow you to share or cache data in-memory between different Spark contexts by storing RDDs or Dataframes as a file in the Alluxio system. The files can then be accessed by any Spark job like a file in any other distributed storage system.
These two blogs do a good job of summarizing the end-to-end workflow of using Alluxio to share RDDs <https://alluxio.com/blog/effective-spark-rdds-with-alluxio> or Dataframes <https://alluxio.com/blog/effective-spark-dataframes-with-alluxio> between Spark jobs. Hope this helps, Calvin On Tue, Dec 13, 2016 at 3:42 AM, Chetan Khatri <ckhatriman...@gmail.com> wrote: > Hello Guys, > > What would be approach to accomplish Spark Multiple Shared Context without > Alluxio and with with Alluxio , and what would be best practice to achieve > parallelism and concurrency for spark jobs. > > Thanks. > > -- > Yours Aye, > Chetan Khatri. > M.+91 76666 80574 <+91%2076666%2080574> > Data Science Researcher > INDIA > > Statement of Confidentiality > ———————————————————————————— > The contents of this e-mail message and any attachments are confidential > and are intended solely for addressee. The information may also be legally > privileged. This transmission is sent in trust, for the sole purpose of > delivery to the intended recipient. If you have received this transmission > in error, any use, reproduction or dissemination of this transmission is > strictly prohibited. If you are not the intended recipient, please > immediately notify the sender by reply e-mail or phone and delete this > message and its attachments, if any. >