You can use `.checkpoint()`:
val sc: SparkContext
val result1 =
result1.count() // Will save `myrdd` to HDFS and do map(op1…
val result2 =
result2.count() // Will load `myrdd` from HDFS and do map(op2…

On Tue, Aug 1, 2017 at 2:05 PM, jeff saremi <> wrote:

> Calling cache/persist fails all our jobs (i have  posted 2 threads on
> this).
> And we're giving up hope in finding a solution.
> So I'd like to find a workaround for that:
> If I save an RDD to hdfs and read it back, can I use it in more than one
> operation?
> Example: (using cache)
> // do a whole bunch of transformations on an RDD
> myrdd.cache()
> val result1 =
> val result2 =
> // in the above I am assuming that a call to cache will prevent all
> previous transformation from being calculated twice
> I'd like to somehow get result1 and result2 without duplicating work. How
> can I do that?
> thanks
> Jeff

Reply via email to