The following is copied from the paper, is something related with rdd lineage. Is there a unit test that covers this scenario(rdd partition lost and recovery)? Thanks.
If a partition of an RDD is lost, the RDD has enough information about how it was derived from other RDDs to recompute just that partition. Thus, lost data can be recovered, often quite quickly, without requiring costly replication. bit1...@163.com From: bit1...@163.com Date: 2015-07-31 13:11 To: Tathagata Das; yuzhihong CC: user Subject: Re: Re: How RDD lineage works Thanks TD and Zhihong for the guide. I will check it bit1...@163.com From: Tathagata Das Date: 2015-07-31 12:27 To: Ted Yu CC: bit1...@163.com; user Subject: Re: How RDD lineage works You have to read the original Spark paper to understand how RDD lineage works. https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: Please take a look at: core/src/test/scala/org/apache/spark/CheckpointSuite.scala Cheers On Thu, Jul 30, 2015 at 7:39 PM, bit1...@163.com <bit1...@163.com> wrote: Hi, I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works. If there is, What's the class name is it? Thanks! bit1...@163.com