Personally I think forcing the stream to fail (e.g. check offsets in downstream store and throw exception if they aren't as expected) is the safest thing to do.
If you proceed after a failure, you need a place to reliably record the batches that failed for later processing. On Wed, Dec 7, 2016 at 1:46 PM, map reduced <k3t.gi...@gmail.com> wrote: > Hi, > > I am trying to solve this problem - in my streaming flow, every day few jobs > fail due to some (say kafka cluster maintenance etc, mostly unavoidable) > reasons for few batches and resumes back to success. > I want to reprocess those failed jobs programmatically (assume I have a way > of getting start-end offsets for kafka topics for failed jobs). I was > thinking of these options: > 1) Somehow pause streaming job when it detects failing jobs - this seems not > possible. > 2) From driver - run additional processing to check every few minutes using > driver rest api (/api/v1/applications...) what jobs have failed and submit > batch jobs for those failed jobs > > 1 - doesn't seem to be possible, and I don't want to kill streaming context > just for few failing batches to stop the job for some time and resume after > few minutes. > 2 - seems like a viable option, but a little complicated, since even the > batch job can fail due to whatever reasons and I am back to tracking that > separately etc. > > Does anyone has faced this issue or have any suggestions? > > Thanks, > KP --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org