subject:"skipping ahead in RDD"

Re: skipping ahead in RDD

2014-02-26 Thread Tathagata Das

If you are doing a computation where the result at time T depends on all the previous data till T, then Spark Streaming will automatically ask you to checkpoint the RDDs generated through Spark Streaming periodically. Checkpointing means saving the RDD to HDFS (or HDFS compatible system). Say the c

Re: skipping ahead in RDD

2014-02-26 Thread Mayur Rustagi

You can checkpoint & itll stop the lineage to only updates after the checkpoint. Regards Mayur Mayur Rustagi Ph: +919632149971 h ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Wed, Feb 26, 2014 at 1:23 PM, Adrian Mocanu wrote: > Hi > > S

skipping ahead in RDD

2014-02-26 Thread Adrian Mocanu

Hi Scenario: Say I've been streaming tuples with Spark for 24 hours and one of the nodes fails. The RDD will be recomputed on the other Spark nodes and the streaming continues. I'm interested to know how I can skip the first 23 hours and jump in the stream to the last hour. Is this possible?

Re: skipping ahead in RDD

Re: skipping ahead in RDD

skipping ahead in RDD

3 matches

Site Navigation

Mail list logo

Footer information