I have Flink set up with 2 taskmanagers and one jobmanager. I've allocated
25 gb of JVM Heap and 15 gb of Flink managed memory. I have 2 jobs
running. After 3 hours this exception was thrown. How can I configure
flink to prevent this from happening?
2021-10-07 12:38:50
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable
failure threshold.
at org.apache.flink.runtime.checkpoint.CheckpointFailureManager
.handleCheckpointException(CheckpointFailureManager.java:98)
at org.apache.flink.runtime.checkpoint.CheckpointFailureManager
.handleJobLevelCheckpointException(CheckpointFailureManager.java:67)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator
.abortPendingCheckpoint(CheckpointCoordinator.java:1934)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator
.abortPendingCheckpoint(CheckpointCoordinator.java:1906)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(
CheckpointCoordinator.java:96)
at org.apache.flink.runtime.checkpoint.
CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:
1990)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
at java.lang.Thread.run(Thread.java:748)
--
Robert Cullen
240-475-4490