Robert Metzger created FLINK-23925: -------------------------------------- Summary: HistoryServer: Archiving job with more than one attempt fails Key: FLINK-23925 URL: https://issues.apache.org/jira/browse/FLINK-23925 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.2 Reporter: Robert Metzger
Error: {code} 2021-08-23 16:26:01,953 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager 00000000000000000000000000000...@akka.tcp://flink@localhost:6123/user/rpc/jobmanager_2 for job ca9f6a073d311d60f457a1c4243e7dc3 from the resource manager. 2021-08-23 16:26:02,137 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Could not archive completed job CarTopSpeedWindowingExample(ca9f6a073d311d60f457a1c4243e7dc3) to the history server. java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: attempt does not exist at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_252] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) [?:1.8.0_252] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] Caused by: java.lang.IllegalArgumentException: attempt does not exist at org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:109) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.runtime.executiongraph.ArchivedExecutionVertex.getPriorExecutionAttempt(ArchivedExecutionVertex.java:31) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.runtime.rest.handler.job.SubtaskExecutionAttemptDetailsHandler.archiveJsonWithPath(SubtaskExecutionAttemptDetailsHandler.java:140) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.runtime.webmonitor.history.OnlyExecutionGraphJsonArchivist.archiveJsonWithPath(OnlyExecutionGraphJsonArchivist.java:51) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.runtime.webmonitor.WebMonitorEndpoint.archiveJsonWithPath(WebMonitorEndpoint.java:1031) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:61) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) ~[?:1.8.0_252] ... 3 more {code} Steps to reproduce: - start a Flink reactive mode job manager: mkdir usrlib cp ./examples/streaming/TopSpeedWindowing.jar usrlib/ # Submit Job in Reactive Mode ./bin/standalone-job.sh start -Dscheduler-mode=reactive -Dexecution.checkpointing.interval="10s" -j org.apache.flink.streaming.examples.windowing.TopSpeedWindowing # Start first TaskManager ./bin/taskmanager.sh start - Add another taskmanager to trigger a restart - Cancel the job See the failure in the jobmanager logs. -- This message was sent by Atlassian Jira (v8.3.4#803005)