Re: Difference between Checkpointing and Persist

Jack Kolokasis Thu, 18 Apr 2019 10:58:42 -0700

Hi,

in my point of view a good approach is first persist your data inStorageLevel.Memory_And_Disk and then perform join. This will accelerateyour computation because data will be presented in memory and in yourlocal intermediate storage device.


--Iacovos

On 4/18/19 8:49 PM, Subash Prabakar wrote:

Hi All,

I have a doubt about checkpointing and persist/saving.

Say we have one RDD - containing huge data,
1. We checkpoint and perform join
2. We persist as StorageLevel.MEMORY_AND_DISK and perform join
3. We save that intermediate RDD and perform join (using same RDD -saving is to just persist intermediate result before joining)
Which of the above is faster and whats the difference?


Thanks,
Subash


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Difference between Checkpointing and Persist

Reply via email to