We are trying out Flink 1.7.0. We always get this exception when submitting
a job with external checkpoint via REST. Job parallelism is 1,600. state
size is probably in the range of 1-5 TBs. Job is actually started. Just
REST api returns this failure.

If we submitting the job without external checkpoint, everything works
fine.

Anyone else see such problem with 1.7? Appreciate your help!

Thanks,
Steven

org.apache.flink.runtime.rest.handler.RestHandlerException:
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$4(JarRunHandler.java:114)
        at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException:
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
        ... 21 more
Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#-641142843]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more

Reply via email to