Stephan Ewen created FLINK-5232:
-----------------------------------

             Summary: Add a Thread default uncaught exception handler on the 
JobManager
                 Key: FLINK-5232
                 URL: https://issues.apache.org/jira/browse/FLINK-5232
             Project: Flink
          Issue Type: Sub-task
          Components: JobManager
            Reporter: Stephan Ewen
             Fix For: 1.1.3


When some JobManager threads die because of uncaught exceptions, we should 
bring down the JobManager. If a thread dies from an uncaught exception, there 
is a high chance that the JobManager becomes dysfunctional.

The only sfae thing is to rely on the JobManager being restarted by YARN / 
Mesos / Kubernetes / etc.

I suggest to add this code to the JobManager launch:

{code}
Thread.setDefaultUncaughtExceptionHandler(new UncaughtExceptionHandler() {

    @Override
    public void uncaughtException(Thread t, Throwable e) {
        try {
            LOG.error("Thread {} died due to an uncaught exception. Killing 
process.", t.getName());
        } finally {
            Runtime.getRuntime().halt(-1);
        }
    }
});
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to