Hey folks,

We experienced a pipeline failure where our job manager restarted and we
were for some reason unable to restore from our last successful checkpoint.
We had regularly completed checkpoints every 10 minutes up to this failure
and 0 failed checkpoints logged. Using Flink version 1.17.1.


Wondering if anyone can shed light on what might have happened?


Here's the error from our logs:


Message: FATAL: Thread ‘Checkpoint Timer’ produced an uncaught exception.
Stopping the process...


extendedStackTrace: java.util.concurrent.CompletionException:
java.util.concurrent.CompletionException: java.lang.NullPointerException

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$8(CheckpointCoordinator.java:669)
~[a-pipeline-name.jar:1.0]

at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
~[?:?]

at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
~[?:?]

at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
[?:?]

at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
[?:?]

at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910)
[?:?]

at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[?:?]

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]

at java.lang.Thread.run(Thread.java:829) [?:?]

Caused by: java.util.concurrent.CompletionException:
java.lang.NullPointerException

at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
~[?:?]

at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
~[?:?]

at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932)
~[?:?]

at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
~[?:?]

... 7 more

Caused by: java.lang.NullPointerException

at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.abortCurrentTriggering(OperatorCoordinatorHolder.java:399)
~[a-pipeline-name.jar:1.0]

at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]

at
java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
~[?:?]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:947)
~[a-pipeline-name.jar:1.0]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.onTriggerFailure(CheckpointCoordinator.java:923)
~[a-pipeline-name.jar:1.0]

at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$startTriggeringCheckpoint$7(CheckpointCoordinator.java:655)
~[a-pipeline-name.jar:1.0]

at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
~[?:?]

at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
~[?:?]

... 7 more

Reply via email to