Take a look at this article
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html From: Tzahi File <tzahi.f...@ironsrc.com> Sent: Wednesday, August 28, 2019 5:18 AM To: user <user@spark.apache.org> Subject: Caching tables in spark Hi, Looking for your knowledge with some question. I have 2 different processes that read from the same raw data table (around 1.5 TB). Is there a way to read this data once and cache it somehow and to use this data in both processes? Thanks -- Tzahi File Data Engineer <http://www.ironsrc.com/> email <mailto:tzahi.f...@ironsrc.com> tzahi.f...@ironsrc.com mobile <tel:+972-546864835> +972-546864835 fax +972-77-5448273 ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv <http://www.ironsrc.com/> ironsrc.com <https://www.linkedin.com/company/ironsource> <https://twitter.com/ironsource> <https://www.facebook.com/ironSource> <https://plus.google.com/+ironsrc> This email (including any attachments) is for the sole use of the intended recipient and may contain confidential information which may be protected by legal privilege. If you are not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication and/or its content is strictly prohibited. If you are not the intended recipient, please immediately notify us by reply email or by telephone, delete this email and destroy any copies. Thank you.