Hi Dan, You could refer to the "Fix Versions" in FLINK-16753 [1] and know that this bug is resolved after 1.11.3 not 1.11.1.
[1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang ________________________________ From: Dan Hill <[email protected]> Sent: Tuesday, April 27, 2021 7:50 To: Yun Tang <[email protected]> Cc: Robert Metzger <[email protected]>; user <[email protected]> Subject: Re: Checkpoint error - "The job has failed" Hey Yun and Robert, I'm using Flink v1.11.1. Robert, I'll send you a separate email with the logs. On Mon, Apr 26, 2021 at 12:46 AM Yun Tang <[email protected]<mailto:[email protected]>> wrote: Hi Dan, I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3. [1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang ________________________________ From: Robert Metzger <[email protected]<mailto:[email protected]>> Sent: Monday, April 26, 2021 14:46 To: Dan Hill <[email protected]<mailto:[email protected]>> Cc: user <[email protected]<mailto:[email protected]>> Subject: Re: Checkpoint error - "The job has failed" Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill <[email protected]<mailto:[email protected]>> wrote: My Flink job failed to checkpoint with a "The job has failed" error. The logs contained no other recent errors. I keep hitting the error even if I cancel the jobs and restart them. When I restarted my jobmanager and taskmanager, the error went away. What error am I hitting? It looks like there is bad state that lives outside the scope of a job. How often do people restart their jobmanagers and taskmanager to deal with errors like this?
