首先,我使用的flink版本是1.11,K8S版本是v1.17。 启动的集群的脚本命令是: ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=flink-cluster -Dkubernetes.jobmanager.service-account=flink -Dtaskmanager.memory.process.size=4096m -Dkubernetes.taskmanager.cpu=2 -Dtaskmanager.numberOfTaskSlots=4 -Dkubernetes.namespace=flink -Dkubernetes.rest-service.exposed.type=NodePort -Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" -Dakka.framesize=104857600b -Dkubernetes.container.image=flink:1.11.1
集群应该是启动成功,启动的部分日志如下所示: 2020-09-10 14:08:56,417 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster flink-cluster successfully, JobManager Web Interface: http://10.10.102.20:30398 可以通过上述网址成功访问到UI界面。 接下来是我的两个问题: 1、关于提交的命令: 仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id=flink-cluster examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of flink-cluster 而后我仿照非native的方法,以如下命令提交: ./bin/flink run -m 10.10.101.26:30398 ./examples/streaming/WordCount.jar,似乎提交成功了。 Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗? 2、以第二种命令提交后报错。 以第二种命令提交后,有如下报错: Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ... 47 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ... 27 more Caused by: java.util.concurrent.TimeoutException Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/