[ https://issues.apache.org/jira/browse/FLINK-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl reassigned FLINK-25069: ------------------------------------- Assignee: Matthias Pohl > YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager failed on > AZP > ------------------------------------------------------------------------------- > > Key: FLINK-25069 > URL: https://issues.apache.org/jira/browse/FLINK-25069 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN > Affects Versions: 1.15.0 > Reporter: Till Rohrmann > Assignee: Matthias Pohl > Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > The test > {{YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager}} fails > on AZP with: > {code} > 2021-11-25T18:28:27.9848753Z Nov 25 18:28:27 [ERROR] Tests run: 3, Failures: > 0, Errors: 3, Skipped: 0, Time elapsed: 3,676.541 s <<< FAILURE! - in > org.apache.flink.yarn.YARNHighAvailabilityITCase > 2021-11-25T18:28:27.9849967Z Nov 25 18:28:27 [ERROR] > org.apache.flink.yarn.YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager > Time elapsed: 70.846 s <<< ERROR! > 2021-11-25T18:28:27.9850929Z Nov 25 18:28:27 > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.client.JobSubmissionException: Failed to submit > JobGraph. > 2021-11-25T18:28:27.9854591Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > 2021-11-25T18:28:27.9855441Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > 2021-11-25T18:28:27.9856301Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.submitJob(YARNHighAvailabilityITCase.java:378) > 2021-11-25T18:28:27.9857202Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testJobRecoversAfterKillingTaskManager$1(YARNHighAvailabilityITCase.java:204) > 2021-11-25T18:28:27.9858300Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) > 2021-11-25T18:28:27.9859245Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.testJobRecoversAfterKillingTaskManager(YARNHighAvailabilityITCase.java:197) > 2021-11-25T18:28:27.9860026Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-11-25T18:28:27.9860705Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2021-11-25T18:28:27.9861466Z Nov 25 18:28:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-11-25T18:28:27.9862158Z Nov 25 18:28:27 at > java.lang.reflect.Method.invoke(Method.java:498) > 2021-11-25T18:28:27.9863016Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 2021-11-25T18:28:27.9863959Z Nov 25 18:28:27 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2021-11-25T18:28:27.9864829Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 2021-11-25T18:28:27.9865604Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2021-11-25T18:28:27.9866300Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 2021-11-25T18:28:27.9867044Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 2021-11-25T18:28:27.9867692Z Nov 25 18:28:27 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2021-11-25T18:28:27.9868220Z Nov 25 18:28:27 at > java.lang.Thread.run(Thread.java:748) > 2021-11-25T18:28:27.9869072Z Nov 25 18:28:27 Suppressed: > java.lang.AssertionError: There is at least one application on the cluster > that is not finished.[App application_1637861234319_0001 is in state RUNNING.] > 2021-11-25T18:28:27.9870263Z Nov 25 18:28:27 at > org.junit.Assert.fail(Assert.java:89) > 2021-11-25T18:28:27.9870862Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnTestBase$CleanupYarnApplication.close(YarnTestBase.java:325) > 2021-11-25T18:28:27.9871516Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:289) > 2021-11-25T18:28:27.9871986Z Nov 25 18:28:27 ... 13 more > 2021-11-25T18:28:27.9872665Z Nov 25 18:28:27 Caused by: > org.apache.flink.runtime.client.JobSubmissionException: Failed to submit > JobGraph. > 2021-11-25T18:28:27.9873393Z Nov 25 18:28:27 at > org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$11(RestClusterClient.java:433) > 2021-11-25T18:28:27.9874102Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) > 2021-11-25T18:28:27.9874774Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) > 2021-11-25T18:28:27.9875454Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-11-25T18:28:27.9876123Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > 2021-11-25T18:28:27.9876837Z Nov 25 18:28:27 at > org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:373) > 2021-11-25T18:28:27.9877539Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2021-11-25T18:28:27.9878393Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2021-11-25T18:28:27.9879043Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-11-25T18:28:27.9879768Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) > 2021-11-25T18:28:27.9880461Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943) > 2021-11-25T18:28:27.9881229Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) > 2021-11-25T18:28:27.9881883Z Nov 25 18:28:27 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2021-11-25T18:28:27.9882700Z Nov 25 18:28:27 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2021-11-25T18:28:27.9883223Z Nov 25 18:28:27 ... 1 more > 2021-11-25T18:28:27.9883780Z Nov 25 18:28:27 Caused by: > org.apache.flink.runtime.rest.util.RestClientException: [Internal server > error., <Exception on server side: > 2021-11-25T18:28:27.9884529Z Nov 25 18:28:27 > org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has > already been submitted. > 2021-11-25T18:28:27.9885242Z Nov 25 18:28:27 at > org.apache.flink.runtime.client.DuplicateJobSubmissionException.of(DuplicateJobSubmissionException.java:29) > 2021-11-25T18:28:27.9885954Z Nov 25 18:28:27 at > org.apache.flink.runtime.dispatcher.Dispatcher.submitJob(Dispatcher.java:320) > 2021-11-25T18:28:27.9886536Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-11-25T18:28:27.9887090Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2021-11-25T18:28:27.9887751Z Nov 25 18:28:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-11-25T18:28:27.9888357Z Nov 25 18:28:27 at > java.lang.reflect.Method.invoke(Method.java:498) > 2021-11-25T18:28:27.9888989Z Nov 25 18:28:27 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) > 2021-11-25T18:28:27.9889817Z Nov 25 18:28:27 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > 2021-11-25T18:28:27.9890560Z Nov 25 18:28:27 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) > 2021-11-25T18:28:27.9891256Z Nov 25 18:28:27 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) > 2021-11-25T18:28:27.9891961Z Nov 25 18:28:27 at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) > 2021-11-25T18:28:27.9892834Z Nov 25 18:28:27 at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) > 2021-11-25T18:28:27.9893462Z Nov 25 18:28:27 at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > 2021-11-25T18:28:27.9894044Z Nov 25 18:28:27 at > akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > 2021-11-25T18:28:27.9894632Z Nov 25 18:28:27 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > 2021-11-25T18:28:27.9895213Z Nov 25 18:28:27 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > 2021-11-25T18:28:27.9895795Z Nov 25 18:28:27 at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > 2021-11-25T18:28:27.9896393Z Nov 25 18:28:27 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > 2021-11-25T18:28:27.9896996Z Nov 25 18:28:27 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > 2021-11-25T18:28:27.9897602Z Nov 25 18:28:27 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > 2021-11-25T18:28:27.9898166Z Nov 25 18:28:27 at > akka.actor.Actor.aroundReceive(Actor.scala:537) > 2021-11-25T18:28:27.9898683Z Nov 25 18:28:27 at > akka.actor.Actor.aroundReceive$(Actor.scala:535) > 2021-11-25T18:28:27.9899307Z Nov 25 18:28:27 at > akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) > 2021-11-25T18:28:27.9900000Z Nov 25 18:28:27 at > akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) > 2021-11-25T18:28:27.9900547Z Nov 25 18:28:27 at > akka.actor.ActorCell.invoke(ActorCell.scala:548) > 2021-11-25T18:28:27.9901085Z Nov 25 18:28:27 at > akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) > 2021-11-25T18:28:27.9901616Z Nov 25 18:28:27 at > akka.dispatch.Mailbox.run(Mailbox.scala:231) > 2021-11-25T18:28:27.9902200Z Nov 25 18:28:27 at > akka.dispatch.Mailbox.exec(Mailbox.scala:243) > 2021-11-25T18:28:27.9902967Z Nov 25 18:28:27 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > 2021-11-25T18:28:27.9903587Z Nov 25 18:28:27 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > 2021-11-25T18:28:27.9904182Z Nov 25 18:28:27 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > 2021-11-25T18:28:27.9904805Z Nov 25 18:28:27 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > 2021-11-25T18:28:27.9905290Z Nov 25 18:28:27 > 2021-11-25T18:28:27.9905666Z Nov 25 18:28:27 End of exception on server side>] > 2021-11-25T18:28:27.9906179Z Nov 25 18:28:27 at > org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:532) > 2021-11-25T18:28:27.9906842Z Nov 25 18:28:27 at > org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:512) > 2021-11-25T18:28:27.9907507Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) > 2021-11-25T18:28:27.9908163Z Nov 25 18:28:27 at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) > 2021-11-25T18:28:27.9908681Z Nov 25 18:28:27 ... 4 more > 2021-11-25T18:28:27.9909001Z Nov 25 18:28:27 > 2021-11-25T18:28:27.9909632Z Nov 25 18:28:27 [ERROR] > org.apache.flink.yarn.YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint > Time elapsed: 1,800.315 s <<< ERROR! > 2021-11-25T18:28:27.9910379Z Nov 25 18:28:27 > org.junit.runners.model.TestTimedOutException: test timed out after 1800000 > milliseconds > 2021-11-25T18:28:27.9910924Z Nov 25 18:28:27 at > java.lang.Thread.sleep(Native Method) > 2021-11-25T18:28:27.9911487Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1240) > 2021-11-25T18:28:27.9912182Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:607) > 2021-11-25T18:28:27.9913034Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:419) > 2021-11-25T18:28:27.9913782Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.deploySessionCluster(YARNHighAvailabilityITCase.java:364) > 2021-11-25T18:28:27.9914595Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testKillYarnSessionClusterEntrypoint$0(YARNHighAvailabilityITCase.java:174) > 2021-11-25T18:28:27.9915326Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase$$Lambda$503/1259621657.run(Unknown > Source) > 2021-11-25T18:28:27.9915947Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) > 2021-11-25T18:28:27.9916650Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint(YARNHighAvailabilityITCase.java:162) > 2021-11-25T18:28:27.9917328Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-11-25T18:28:27.9917905Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2021-11-25T18:28:27.9918570Z Nov 25 18:28:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-11-25T18:28:27.9919246Z Nov 25 18:28:27 at > java.lang.reflect.Method.invoke(Method.java:498) > 2021-11-25T18:28:27.9919847Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 2021-11-25T18:28:27.9920514Z Nov 25 18:28:27 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2021-11-25T18:28:27.9921293Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 2021-11-25T18:28:27.9921936Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2021-11-25T18:28:27.9922772Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 2021-11-25T18:28:27.9923503Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 2021-11-25T18:28:27.9924238Z Nov 25 18:28:27 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2021-11-25T18:28:27.9924757Z Nov 25 18:28:27 at > java.lang.Thread.run(Thread.java:748) > 2021-11-25T18:28:27.9925156Z Nov 25 18:28:27 > 2021-11-25T18:28:27.9925694Z Nov 25 18:28:27 [ERROR] > org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval > Time elapsed: 1,800.087 s <<< ERROR! > 2021-11-25T18:28:27.9926411Z Nov 25 18:28:27 > org.junit.runners.model.TestTimedOutException: test timed out after 1800000 > milliseconds > 2021-11-25T18:28:27.9926957Z Nov 25 18:28:27 at > java.lang.Thread.sleep(Native Method) > 2021-11-25T18:28:27.9927499Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1240) > 2021-11-25T18:28:27.9928190Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:607) > 2021-11-25T18:28:27.9928899Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:419) > 2021-11-25T18:28:27.9929731Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.deploySessionCluster(YARNHighAvailabilityITCase.java:364) > 2021-11-25T18:28:27.9930513Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.lambda$testClusterClientRetrieval$2(YARNHighAvailabilityITCase.java:230) > 2021-11-25T18:28:27.9931236Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase$$Lambda$504/1893740748.run(Unknown > Source) > 2021-11-25T18:28:27.9931852Z Nov 25 18:28:27 at > org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) > 2021-11-25T18:28:27.9932684Z Nov 25 18:28:27 at > org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval(YARNHighAvailabilityITCase.java:225) > 2021-11-25T18:28:27.9933406Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-11-25T18:28:27.9933989Z Nov 25 18:28:27 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2021-11-25T18:28:27.9934647Z Nov 25 18:28:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-11-25T18:28:27.9935251Z Nov 25 18:28:27 at > java.lang.reflect.Method.invoke(Method.java:498) > 2021-11-25T18:28:27.9935839Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > 2021-11-25T18:28:27.9936502Z Nov 25 18:28:27 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2021-11-25T18:28:27.9937158Z Nov 25 18:28:27 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > 2021-11-25T18:28:27.9937813Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2021-11-25T18:28:27.9938497Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > 2021-11-25T18:28:27.9939288Z Nov 25 18:28:27 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > 2021-11-25T18:28:27.9939947Z Nov 25 18:28:27 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 2021-11-25T18:28:27.9940452Z Nov 25 18:28:27 at > java.lang.Thread.run(Thread.java:748) > 2021-11-25T18:28:27.9940854Z Nov 25 18:28:27 > 2021-11-25T18:28:28.9205416Z Nov 25 18:28:28 [ERROR] Picked up > JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27085&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=29849 -- This message was sent by Atlassian Jira (v8.20.1#820001)