Aitozi created FLINK-23871:
------------------------------

             Summary: Dispatcher should handle finishing job exception when 
recover
                 Key: FLINK-23871
                 URL: https://issues.apache.org/jira/browse/FLINK-23871
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.13.2
            Reporter: Aitozi


The exception during run recovery job will trigger fatal error which is 
introduced in https://issues.apache.org/jira/browse/FLINK-9097. But if a job 
have reached a finished status. But crash at cleap up phase or any other post 
phase. When recover job, it may recover a job in 
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the 
dispatcher fatal again. 

I think we should deal with the  RunningJobsRegistry.JobSchedulingStatus.DONE 
with special exception like JobFinishingException, which represents the 
job/master crashed in job finishing phase. And only do the clean up work for 
this exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to