Implementing fail-fast upon critical spark streaming tasks errors

yam Sun, 06 Dec 2015 07:11:24 -0800

When a spark streaming task is failed (after exceeding
spark.task.maxFailures), the related batch job is considered failed and the
driver continues to the next batch in the pipeline after updating checkpoint
to the next checkpoint positions (the new offsets when using Kafka direct
streaming).


I'm looking for a fail-fast implementation where one or more (or all) tasks
are failed (a critical error case happens and there's no point in
progressing to the next batch in line and also un-processed data should not
be lost) - identify this condition and exit application before committing
updated checkpoint.

Any ideas? or more specifically:

1. Is there a way for the driver program to get notified upon a job failure
(which tasks have failed / rdd metadata) before updating checkpoint.
StreamingContext has a addStreamingListener method but its onBatchCompleted
event has no indication on batch failure (only completion). 

2. Stopping and exiting the application. when running in yarn-client mode,
calling streamingContext.stop halts the process and does not exit.

Thanks!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-fail-fast-upon-critical-spark-streaming-tasks-errors-tp25606.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Implementing fail-fast upon critical spark streaming tasks errors

Reply via email to