Hi Greg,

Can you describe the steps to reproduce the problem, or can you attach the
full jobmanager logs? Because JobExecutionResultHandler appears in your
log, I
assume that you are starting a job cluster on YARN. Without seeing the
complete logs, I cannot be sure what exactly happens. For now, you can try
setting the config option web.timeout to a higher value.

Best,
Gary

On Fri, Aug 31, 2018 at 8:01 PM, Greg Finch <finchgreg...@gmail.com> wrote:

> I'm having a problem with akka timeout when starting my cluster.  The
> error is "Ask timed out after 10000 ms.".  I have changed the
> akka.ask.timeout config setting to be 300000 ms, but it still times out and
> fails after 10 seconds.  I confirmed that the config is properly set by
> both checking the Job Manager configuration tab (it shows 300000 ms) as
> well logging the output of AkkaUtils.getTimeout(configuration) which also
> shows 300000ms.  It seems something is not honoring that configuration
> value.
>
> I did find a different thread that discussed the fact that the
> LocalStreamEnvironment will not honor this setting, but that is not my
> case.  I am running on a cluster (AWS EMR) using the regular
> StreamExecutionEnvironment.  This is Flink 1.5.2.
>
> Any ideas?
>
> ~~~~~
>
> 2018-08-31 17:37:55 INFO  
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new token 
> for : ip-10-213-139-66.ec2.internal:8041
> 2018-08-31 17:37:55 INFO  
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new token 
> for : ip-10-213-136-25.ec2.internal:8041
> 2018-08-31 17:38:34 ERROR 
> o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
> Implementation error: Unhandled exception.
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#-219618710]] after [10000 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>       at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>       at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>       at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>       at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>       at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>       at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>       at java.lang.Thread.run(Thread.java:748)
> 2018-08-31 17:38:41 INFO  
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Waiting for 
> application to be successfully unregistered.
> 2018-08-31 17:38:41 INFO  
> o.a.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl  - Interrupted 
> while waiting for queue
> java.lang.InterruptedException: null
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323)
> 2018-08-31 17:38:42 WARN  akka.remote.ReliableDeliverySupervisor 
> flink-akka.remote.default-remote-dispatcher-81 - Association with remote 
> system [akka.tcp://flink@ip-10-213-142-102.ec2.internal:42027] has failed, 
> address is now gated for [50] ms. Reason: [Disassociated]
>
>
>

Reply via email to