Hi,
in my Spark Streaming application, computations depend on users' input in
terms of
* user-defined functions
* computation rules
* etc.
that can throw exceptions in various cases (think: exception in UDF,
division by zero, invalid access by key etc.).
Now I am wondering about what is a good/reasonable way to deal with those
errors. I think I want to continue processing (the whole stream processing
pipeline should not die because of one single malformed item in the
stream), i.e., catch the exception, but still I need a way to tell the user
something went wrong.
So how can I get the information that something went wrong back to the
driver and what is a reasonable way to do that?
While writing this, something like the following came into my mind:
val errors = sc.accumulator(...) // of type List[Throwable]
dstream.map(item => {
Try {
someUdf(item)
} match {
case Success(value) =>
value
case Failure(err) =>
errors += err // remember error
0 // default value
}
})
Does this make sense?
Thanks
Tobias