Hi all,

I'm interested in instrumenting an Apache Flink application so that we can
monitor exceptions. I was wondering what the best practices are here? Is
there a good way to observe all the exceptions inside of a Flink
application, including Flink internals?

We are currently thinking of using Bugsnag, which has some steps to
integrate with java applications:
https://docs.bugsnag.com/platforms/java/other/, which works fine for
uncaught exceptions in the job manager / pipeline driver context, but
doesn't catch anything outside of that.

We're also interested in reporting on exceptions that occur in the job
execution context, eg. in task managers.

Any tips/suggestions? I'd love to learn more about exception tracking and
handling in Flink :)

Reply via email to