机器参数:三台  32C64G  centos  7.8,cdh集群在这上面先部署
flink版本:1.11.2,在三台集群上搭建的集群

hadoop集群是用cdh搭建的


启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py traffic.py

程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka




这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败

测试官方例子  flink run -m yarn-cluster examples/batch/WordCount.jar   
是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊?




下面是主要报错信息

Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.client.JobExecutionException: Could not instantiate 
JobManager.

at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not 
instantiate JobManager.

at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

Caused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint 
file/directory '2' on file system 'file'.

at 
org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) 
~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) 
~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
~[flink-dist_2.12-1.11.2.jar:1.11.2]

at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
 ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] - 
Stopped BLOB server at 0.0.0.0:16109







全部日志可以打开下面的链接:https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note
加载可能慢一些,请稍等一会就出来了~













Best,

MagicHuang





回复