Hi Arnaud,

It seems that the TaskExecutor terminated exceptionally. I think you need
to check the logs of
container_e38_1604477334666_0960_01_000004 to figure out why it crashed or
shut down.

Best,
Yang

LINZ, Arnaud <al...@bouyguestelecom.fr> 于2020年11月16日周一 下午7:11写道:

> Hello,
>
> I'm running Flink 1.10 on a yarn cluster. I have a streaming application,
> that, when under heavy load, fails from time to time with this unique error
> message in the whole yarn log:
>
> (...)
> 2020-11-15 16:18:42,202 WARN
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Received
> late message for now expired checkpoint attempt 63 from task
> 4cbc940112a596db54568b24f9209aac of job 1e1717d19bd8ea296314077e42e1c7e5 at
> container_e38_1604477334666_0960_01_000004 @ xxx (dataPort=33099).
> 2020-11-15 16:18:55,043 INFO  org.apache.flink.yarn.YarnResourceManager
>                  - Closing TaskExecutor connection
> container_e38_1604477334666_0960_01_000004 because: The TaskExecutor is
> shutting down.
> 2020-11-15 16:18:55,087 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Map (7/15)
> (c8e92cacddcd4e41f51a2433d07d2153) switched from RUNNING to FAILED.
> org.apache.flink.util.FlinkException: The TaskExecutor is shutting down.
>
>       at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.onStop(TaskExecutor.java:359)
>         at
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:218)
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:509)
>         at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:175)
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>         at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at akka.japi.pf
> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>         at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>         at
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>         at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>         at
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-11-15 16:18:55,092 INFO
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy
> - Calculating tasks to restart to recover the failed task
> 2f6467d98899e64a4721f0a7b6a059a8_6.
> 2020-11-15 16:18:55,101 INFO
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy
> - 230 tasks should be restarted to recover the failed task
> 2f6467d98899e64a4721f0a7b6a059a8_6.
> (...)
>
> What could be the cause of this failure? Why is there no other error
> message?
>
> I've tried to increase the value of heartbeat.timeout, thinking that maybe
> it was due to a slow responding mapper, but it did not solve the issue.
>
> Best regards,
> Arnaud
>
> ________________________________
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le détruire et
> d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.
>

Reply via email to