I have a spark streaming application running in production. I am trying to
find a solution for a particular use case when my application has a
downtime of say 5 hours and is restarted. Now, when I start my streaming
application after 5 hours there would be considerable amount of data then
in the Kafka and my cluster would be unable to repartition and process that.

Is there any workaround so that when my streaming application starts it
starts taking data for 1-2 hours, process it , then take the data for next
1 hour process it. Now when its done processing of previous 5 hours data
which missed, normal streaming should start with the given slide interval.

Please suggest any ideas and feasibility of this.


Thanks !!
Abhi

Reply via email to