Re: skipping ahead in RDD

2014-02-26 Thread Tathagata Das
If you are doing a computation where the result at time T depends on all the previous data till T, then Spark Streaming will automatically ask you to checkpoint the RDDs generated through Spark Streaming periodically. Checkpointing means saving the RDD to HDFS (or HDFS compatible system). Say the c

Re: skipping ahead in RDD

2014-02-26 Thread Mayur Rustagi
You can checkpoint & itll stop the lineage to only updates after the checkpoint. Regards Mayur Mayur Rustagi Ph: +919632149971 h ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Wed, Feb 26, 2014 at 1:23 PM, Adrian Mocanu wrote: > Hi > > S

skipping ahead in RDD

2014-02-26 Thread Adrian Mocanu
Hi Scenario: Say I've been streaming tuples with Spark for 24 hours and one of the nodes fails. The RDD will be recomputed on the other Spark nodes and the streaming continues. I'm interested to know how I can skip the first 23 hours and jump in the stream to the last hour. Is this possible?