Hi,

in my Spark Streaming application, computations depend on users' input in
terms of
 * user-defined functions
 * computation rules
 * etc.
that can throw exceptions in various cases (think: exception in UDF,
division by zero, invalid access by key etc.).

Now I am wondering about what is a good/reasonable way to deal with those
errors. I think I want to continue processing (the whole stream processing
pipeline should not die because of one single malformed item in the
stream), i.e., catch the exception, but still I need a way to tell the user
something went wrong.

So how can I get the information that something went wrong back to the
driver and what is a reasonable way to do that?

While writing this, something like the following came into my mind:
  val errors = sc.accumulator(...) // of type List[Throwable]
  dstream.map(item => {
    Try {
      someUdf(item)
    } match {
      case Success(value) =>
        value
      case Failure(err) =>
        errors += err  // remember error
        0  // default value
    }
  })
Does this make sense?

Thanks
Tobias

Reply via email to