The data pipeline (DAG) should not be added to the StreamingContext in the case of a recovery scenario. The pipeline metadata is recovered from the checkpoint folder. That is one thing you will need to fix in your code. Also, I don't think the ssc.checkpoint(folder) call should be made in case of the recovery.
The idiom to follow is to set up the DAG in the creatingFunc and not outside of it. This will ensure that if a new context is being created i.e. checkpoint folder does not exist, the DAG will get added to it and then checkpointed. Once a recovery happens, this function is not invoked but everything is recreated from the checkpointed data. Hope this helps, NB -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/kafka-Spark-Streaming-with-checkPointing-fails-to-restart-tp22864p22878.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org