Hi Dan, I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3.
[1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang ________________________________ From: Robert Metzger <[email protected]> Sent: Monday, April 26, 2021 14:46 To: Dan Hill <[email protected]> Cc: user <[email protected]> Subject: Re: Checkpoint error - "The job has failed" Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill <[email protected]<mailto:[email protected]>> wrote: My Flink job failed to checkpoint with a "The job has failed" error. The logs contained no other recent errors. I keep hitting the error even if I cancel the jobs and restart them. When I restarted my jobmanager and taskmanager, the error went away. What error am I hitting? It looks like there is bad state that lives outside the scope of a job. How often do people restart their jobmanagers and taskmanager to deal with errors like this?
