Take a look at this article 

 

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html

 

From: Tzahi File <tzahi.f...@ironsrc.com> 
Sent: Wednesday, August 28, 2019 5:18 AM
To: user <user@spark.apache.org>
Subject: Caching tables in spark

 

Hi, 

 

Looking for your knowledge with some question. 

I have 2 different processes that read from the same raw data table (around 1.5 
TB). 

Is there a way to read this data once and cache it somehow and to use this data 
in both processes? 

 

 

Thanks

-- 


Tzahi File
Data Engineer


 <http://www.ironsrc.com/> 


email  <mailto:tzahi.f...@ironsrc.com> tzahi.f...@ironsrc.com

mobile  <tel:+972-546864835> +972-546864835

fax +972-77-5448273

ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv


 <http://www.ironsrc.com/> ironsrc.com


 <https://www.linkedin.com/company/ironsource>  
<https://twitter.com/ironsource>  <https://www.facebook.com/ironSource>  
<https://plus.google.com/+ironsrc> 


This email (including any attachments) is for the sole use of the intended 
recipient and may contain confidential information which may be protected by 
legal privilege. If you are not the intended recipient, or the employee or 
agent responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination, distribution or copying of this 
communication and/or its content is strictly prohibited. If you are not the 
intended recipient, please immediately notify us by reply email or by 
telephone, delete this email and destroy any copies. Thank you.

 

Reply via email to