I guess it is a jm internal error which crashes the dispatcher or race
condition so that the returning future never completed, possibly related to
jdk bug. But again, never have a log in the case I cannot conclude anything.

Best,
tison.


tison <wander4...@gmail.com> 于2020年1月22日周三 上午10:49写道:

> It is a known issue reported multiple times that if you are in an early
> jdk 1.8.x version, upgrade the bugfix version and the issue will vanish.
>
> I don't ever have a log on jm side when this issue reported so I'm sorry
> unable to explain more...
>
> Best,
> tison.
>
>
> Yang Wang <danrtsey...@gmail.com> 于2020年1月22日周三 上午10:46写道:
>
>> The "web.timeout" will be used for all web monitor asynchronous
>> operations, including the
>> "DispatcherGateway.submitJob" in the "JobSubmitHandler".
>> So when you increase the timeout, does it still could not work?
>>
>> Best,
>> Yang
>>
>> satya brat <bratsatya...@gmail.com> 于2020年1月21日周二 下午8:57写道:
>>
>>> How does web.timeout help hear?? The issue is with respect to aka
>>> dispatched timing out. The job is submitted to the task managers but the
>>> response doesn't reach the client.
>>>
>>> On Tue, Jan 21, 2020 at 12:34 PM Yang Wang <danrtsey...@gmail.com>
>>> wrote:
>>>
>>>> Hi satya,
>>>>
>>>> Maybe the job has been submitted to Dispatcher successfully and the
>>>> internal submitting job takes
>>>> too long time(more than 10s). So it failed with timeout. Could you
>>>> please set the `web.timeout: 30000`
>>>> and run again?
>>>>
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> satya brat <bratsatya...@gmail.com> 于2020年1月20日周一 下午4:34写道:
>>>>
>>>>> I am using standalone cluster of Flink with 1 jobManager and n
>>>>> taskManagers. When I try to submit a job via command line, the job
>>>>> submission fails with error message as
>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not
>>>>> submit job (JobID: f839aefee74aa4483ce8f8fd2e49b69e).
>>>>>
>>>>> On jobManager instance, everything works fine till the job is switched
>>>>> from DEPLOYING to RUNNING. Post that, once akka-timeut expires, I see the
>>>>> following stacktrace
>>>>>
>>>>> akka.pattern.AskTimeoutException: Ask timed out on 
>>>>> [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. 
>>>>> Sender[null] sent message of type 
>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>     at 
>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>     at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>     at 
>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>     at 
>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>     at 
>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> I went through the flink code on github and all the steps required to
>>>>> execute a job seems to be running fine. However, when jobManager has to
>>>>> give job submission ack to flink client that triggered the job, the
>>>>> jobSubmitHandler times out on the akka dispatcher that according to my
>>>>> understanding takes care of communicating with the job client.
>>>>>
>>>>> The Flink job consists for 1 Source (kafka), 2 operators and 1
>>>>> sink(Custom Sink). Following link shows the jobManager logs:
>>>>> https://pastebin.com/raw/3GaTtNrG
>>>>>
>>>>> Once the dispatcher times out, all other Flink UI calls also timeout
>>>>> with same exception.
>>>>>
>>>>> Following are the flink client logs that is used to submit job via
>>>>> command line.
>>>>>
>>>>> 2019-09-28 19:34:21,321 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - 
>>>>> --------------------------------------------------------------------------------
>>>>> 2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  Starting Command Line Client (Version: 1.8.0, 
>>>>> Rev:<unknown>, Date:<unknown>)
>>>>> 2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  OS current user: root
>>>>> 2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  Current Hadoop/Kerberos user: <no hadoop dependency 
>>>>> found>
>>>>> 2019-09-28 19:34:21,322 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle 
>>>>> Corporation - 1.8/25.5-b02
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  Maximum heap size: 2677 MiBytes
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  JAVA_HOME: (not set)
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  No Hadoop Dependency available
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  JVM Options:
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     
>>>>> -Dlog.file=/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/log/flink-root-client-fulfillment-stream-processor-flink-task-manager-2-8047357.log
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     
>>>>> -Dlog4j.configuration=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/log4j-cli.properties
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     
>>>>> -Dlogback.configurationFile=file:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/conf/logback.xml
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  Program Arguments:
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     run
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     -d
>>>>> 2019-09-28 19:34:21,323 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     -c
>>>>> 2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -     /home/fse/flink-kafka-relayer-0.2.jar
>>>>> 2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   -  Classpath: 
>>>>> /var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/log4j-1.2.17.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/slf4j-log4j12-1.7.15.jar:/var/lib/fulfillment-stream-processor/flink-executables/flink-executables/lib/flink-dist_2.11-1.8.0.jar:::
>>>>> 2019-09-28 19:34:21,324 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - 
>>>>> --------------------------------------------------------------------------------
>>>>> 2019-09-28 19:34:21,328 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.rpc.address, <job-manager-ip>
>>>>> 2019-09-28 19:34:21,328 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.rpc.port, 6123
>>>>> 2019-09-28 19:34:21,328 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.heap.size, 1024m
>>>>> 2019-09-28 19:34:21,329 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: taskmanager.heap.size, 1024m
>>>>> 2019-09-28 19:34:21,329 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: taskmanager.numberOfTaskSlots, 4
>>>>> 2019-09-28 19:34:21,329 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: parallelism.default, 1
>>>>> 2019-09-28 19:34:21,329 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: metrics.reporter.jmx.class, 
>>>>> org.apache.flink.metrics.jmx.JMXReporter
>>>>> 2019-09-28 19:34:21,329 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: metrics.reporter.jmx.port, 8789
>>>>> 2019-09-28 19:34:21,333 WARN  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - Could not load CLI class 
>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.
>>>>> java.lang.NoClassDefFoundError: 
>>>>> org/apache/hadoop/yarn/exceptions/YarnException
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:259)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1230)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1190)
>>>>>     at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1115)
>>>>> Caused by: java.lang.ClassNotFoundException: 
>>>>> org.apache.hadoop.yarn.exceptions.YarnException
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>     ... 5 more
>>>>> 2019-09-28 19:34:21,343 INFO  org.apache.flink.core.fs.FileSystem         
>>>>>                   - Hadoop is not in the classpath/dependencies. The 
>>>>> extended set of supported File Systems via Hadoop is not available.
>>>>> 2019-09-28 19:34:21,545 INFO  
>>>>> org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot 
>>>>> create Hadoop Security Module because Hadoop cannot be found in the 
>>>>> Classpath.
>>>>> 2019-09-28 19:34:21,560 INFO  
>>>>> org.apache.flink.runtime.security.SecurityUtils               - Cannot 
>>>>> install HadoopSecurityContext because Hadoop cannot be found in the 
>>>>> Classpath.
>>>>> 2019-09-28 19:34:21,561 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - Running 'run' command.
>>>>> 2019-09-28 19:34:21,566 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - Building program from JAR file
>>>>> 2019-09-28 19:34:21,744 INFO  
>>>>> org.apache.flink.configuration.Configuration                  - Config 
>>>>> uses fallback configuration key 'jobmanager.rpc.address' instead of key 
>>>>> 'rest.address'
>>>>> 2019-09-28 19:34:21,896 INFO  org.apache.flink.runtime.rest.RestClient    
>>>>>                   - Rest client endpoint started.
>>>>> 2019-09-28 19:34:21,898 INFO  org.apache.flink.client.cli.CliFrontend     
>>>>>                   - Starting execution of program
>>>>> 2019-09-28 19:34:21,898 INFO  
>>>>> org.apache.flink.client.program.rest.RestClusterClient        - Starting 
>>>>> program in interactive mode (detached: true)
>>>>> 2019-09-28 19:34:22,594 WARN  
>>>>> org.apache.flink.streaming.api.environment.StreamContextEnvironment  - 
>>>>> Job was executed in detached mode, the results will be available on 
>>>>> completion.
>>>>> 2019-09-28 19:34:22,632 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.rpc.address, <job-manager-ip>
>>>>> 2019-09-28 19:34:22,632 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.rpc.port, 6123
>>>>> 2019-09-28 19:34:22,632 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: jobmanager.heap.size, 1024m
>>>>> 2019-09-28 19:34:22,633 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: taskmanager.heap.size, 1024m
>>>>> 2019-09-28 19:34:22,633 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: taskmanager.numberOfTaskSlots, 4
>>>>> 2019-09-28 19:34:22,633 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: parallelism.default, 1
>>>>> 2019-09-28 19:34:22,633 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: metrics.reporter.jmx.class, 
>>>>> org.apache.flink.metrics.jmx.JMXReporter
>>>>> 2019-09-28 19:34:22,633 INFO  
>>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading 
>>>>> configuration property: metrics.reporter.jmx.port, 8789
>>>>> 2019-09-28 19:34:22,635 INFO  
>>>>> org.apache.flink.client.program.rest.RestClusterClient        - 
>>>>> Submitting job f839aefee74aa4483ce8f8fd2e49b69e (detached: true).
>>>>> 2019-09-28 19:36:04,341 INFO  org.apache.flink.runtime.rest.RestClient    
>>>>>                   - Shutting down rest endpoint.
>>>>> 2019-09-28 19:36:04,343 INFO  org.apache.flink.runtime.rest.RestClient    
>>>>>                   - Rest endpoint shutdown complete.
>>>>> 2019-09-28 19:36:04,343 ERROR org.apache.flink.client.cli.CliFrontend     
>>>>>                   - Error while running the command.
>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not 
>>>>> submit job (JobID: f839aefee74aa4483ce8f8fd2e49b69e)
>>>>>     at 
>>>>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:250)
>>>>>     at 
>>>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
>>>>>     at 
>>>>> org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
>>>>>     at 
>>>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:429)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
>>>>>     at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
>>>>>     at 
>>>>> org.apache.flink.client.cli.CliFrontend$$Lambda$5/1971851377.call(Unknown 
>>>>> Source)
>>>>>     at 
>>>>> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>>>>>     at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
>>>>> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed 
>>>>> to submit JobGraph.
>>>>>     at 
>>>>> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:388)
>>>>>     at 
>>>>> org.apache.flink.client.program.rest.RestClusterClient$$Lambda$17/788892554.apply(Unknown
>>>>>  Source)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$ExceptionCompletion.run(CompletableFuture.java:1246)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$ThenCopy.run(CompletableFuture.java:1333)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361)
>>>>>     at 
>>>>> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:207)
>>>>>     at 
>>>>> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$34/1092254958.accept(Unknown
>>>>>  Source)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:626)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$Async.run(CompletableFuture.java:428)
>>>>>     at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>     at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.apache.flink.runtime.rest.util.RestClientException: 
>>>>> [Internal server error., <Exception on server side:
>>>>> akka.pattern.AskTimeoutException: Ask timed out on 
>>>>> [Actor[akka://flink/user/dispatcher#-177004106]] after [100000 ms]. 
>>>>> Sender[null] sent message of type 
>>>>> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>>>>>     at 
>>>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>>>>>     at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>>>>>     at 
>>>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>>>>>     at 
>>>>> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>>>>     at 
>>>>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>>>>>     at 
>>>>> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> End of exception on server side>]
>>>>>     at 
>>>>> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:389)
>>>>>     at 
>>>>> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:373)
>>>>>     at 
>>>>> org.apache.flink.runtime.rest.RestClient$$Lambda$33/1155836850.apply(Unknown
>>>>>  Source)
>>>>>     at 
>>>>> java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:604)
>>>>>     ... 4 more
>>>>>
>>>>> I have turned on debug logs for flink, akka and kafka but not able to
>>>>> figure out what is going wrong. I have very basic understanding of akka
>>>>> because of which not able to figure out what is going wrong. Can someone
>>>>> help me with that?? I am running flink 1.8.0.
>>>>>
>>>>

Reply via email to