Re: kafka + Spark Streaming with checkPointing fails to restart

NB Wed, 13 May 2015 23:53:34 -0700

The data pipeline (DAG) should not be added to the StreamingContext in the
case of a recovery scenario. The pipeline metadata is recovered from the
checkpoint folder. That is one thing you will need to fix in your code.
Also, I don't think the ssc.checkpoint(folder) call should be made in case
of the recovery.


The idiom to follow is to set up the DAG in the creatingFunc and not outside
of it. This will ensure that if a new context is being created i.e.
checkpoint folder does not exist, the DAG will get added to it and then
checkpointed. Once a recovery happens, this function is not invoked but
everything is recreated from the checkpointed data.

Hope this helps,
NB



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kafka-Spark-Streaming-with-checkPointing-fails-to-restart-tp22864p22878.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: kafka + Spark Streaming with checkPointing fails to restart

Reply via email to