Persist doesn't cut lineage. You might run into StackOverflow problem
with a long lineage. See
https://spark-project.atlassian.net/browse/SPARK-1006 for example.

On Mon, Apr 21, 2014 at 12:11 PM, Diana Carroll <dcarr...@cloudera.com> wrote:
> When might that be necessary or useful?  Presumably I can persist and
> replicate my RDD to avoid re-computation, if that's my goal.  What advantage
> does checkpointing provide over disk persistence with replication?
>
>
> On Mon, Apr 21, 2014 at 2:42 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> Checkpoint clears dependencies. You might need checkpoint to cut a
>> long lineage in iterative algorithms. -Xiangrui
>>
>> On Mon, Apr 21, 2014 at 11:34 AM, Diana Carroll <dcarr...@cloudera.com>
>> wrote:
>> > I'm trying to understand when I would want to checkpoint an RDD rather
>> > than
>> > just persist to disk.
>> >
>> > Every reference I can find to checkpoint related to Spark Streaming.
>> > But
>> > the method is defined in the core Spark library, not Streaming.
>> >
>> > Does it exist solely for streaming, or are there circumstances unrelated
>> > to
>> > streaming in which I might want to checkpoint...and if so, like what?
>> >
>> > Thanks,
>> > Diana
>
>

Reply via email to