Hi,

1.  This rpc timeout occurs during JobMaster deploying task into TaskExecutor. 
The rpc thread in TaskExecutor does not respond the deployment message within 
10 seconds. There are many possibilities to cause this issue, such as network 
problem between TaskExecutor and JobMaster or other time-consuming operators in 
TaskExecutor. The root cause may be a bit complicated for tracing. First you 
can debug when the TaskExecutor receives this message, then you can check when 
the TaskExecutor responses this message, and may also need check what is the 
rpc thread doing during these times.

2.   You can increase the default value of rpc timeout 
parameter(akka.ask.timeout) to work around temporarily.

Best,
Zhijiang
------------------------------------------------------------------
发件人:徐涛 <happydexu...@gmail.com>
发送时间:2018年9月13日(星期四) 14:10
收件人:user <user@flink.apache.org>
主 题:Flink application down due to RpcTimeout exception

Hi All,
 I`m running flink1.6 on yarn,after the program run for a day, the flink 
program fails on yarn, and the error log is as follows:
 It seems that it is due to a timeout error. But I have the following questions:
 1. In which step the flink components communicate failed? What are the two 
components? 
 2. How to solve this problem?
 Thanks a lot!!

java.lang.Exception: Cannot deploy task LeftOuterJoin(where: (=(id, 
article_id)), join: (id, created_time, article_score, PU, article_id, CU, CN)) 
-> select: (id, created_time, article_score, PU, CU, CN) (2/2) 
(d403002a7accc5133cf89a386ddc1dfb) - TaskManager 
(container_1532509321420_463249_01_000002 @ sh-bs-3-i1-hadoop-17-225 
(dataPort=10459)) not responding after a rpcTimeout of 10000 ms
        at 
org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$5(Execution.java:601)
 ~[flink-runtime_2.11-1.6.0.jar:1.6.0]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 ~[na:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 ~[na:1.8.0_65]
        at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 ~[na:1.8.0_65]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_65]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_65]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[na:1.8.0_65]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[na:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_65]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka.tcp://flink@sh-bs-3-i1-hadoop-17-225:24213/user/taskmanager_0#-1762816591]]
 after [10000 ms]. Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) 
~[akka-actor_2.11-2.4.20.jar:na]
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) 
~[akka-actor_2.11-2.4.20.jar:na]
        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
 ~[scala-library-2.11.8.jar:na]
        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) 
~[scala-library-2.11.8.jar:na]
        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) 
~[scala-library-2.11.8.jar:na]
        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
 ~[akka-actor_2.11-2.4.20.jar:na]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
 ~[akka-actor_2.11-2.4.20.jar:na]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
 ~[akka-actor_2.11-2.4.20.jar:na]
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
 ~[akka-actor_2.11-2.4.20.jar:na]
        ... 1 common frames omitted


Best,
Henry

Reply via email to