Hi All, Can someone help me with the following doubts regarding checkpointing :
My code flow is something like follows -> 1) create direct stream from kafka 2) repartition kafka stream 3) mapToPair followed by reduceByKey 4) filter 5) reduceByKeyAndWindow without the inverse function 6) write to cassandra Now when I restart my application from checkpoint, I see repartition and other steps being called for the previous windows which takes longer and delays my aggregations. My understanding was that once data checkpointing is done it should not re-read from kafka and use the saved RDDs but guess I am wrong. Is there a way to avoid the repartition or any workaround for this. Spark Version is 1.4.0 Cheers !! Abhi