When a spark streaming task is failed (after exceeding spark.task.maxFailures), the related batch job is considered failed and the driver continues to the next batch in the pipeline after updating checkpoint to the next checkpoint positions (the new offsets when using Kafka direct streaming).
I'm looking for a fail-fast implementation where one or more (or all) tasks are failed (a critical error case happens and there's no point in progressing to the next batch in line and also un-processed data should not be lost) - identify this condition and exit application before committing updated checkpoint. Any ideas? or more specifically: 1. Is there a way for the driver program to get notified upon a job failure (which tasks have failed / rdd metadata) before updating checkpoint. StreamingContext has a addStreamingListener method but its onBatchCompleted event has no indication on batch failure (only completion). 2. Stopping and exiting the application. when running in yarn-client mode, calling streamingContext.stop halts the process and does not exit. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-fail-fast-upon-critical-spark-streaming-tasks-errors-tp25606.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org