首先,我使用的flink版本是1.11,K8S版本是v1.17。
启动的集群的脚本命令是:
./bin/kubernetes-session.sh  -Dkubernetes.cluster-id=flink-cluster
-Dkubernetes.jobmanager.service-account=flink
-Dtaskmanager.memory.process.size=4096m   -Dkubernetes.taskmanager.cpu=2 
-Dtaskmanager.numberOfTaskSlots=4  -Dkubernetes.namespace=flink 
-Dkubernetes.rest-service.exposed.type=NodePort 
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem%
%jvmopts% %logging% %class% %args%"  -Dakka.framesize=104857600b  
-Dkubernetes.container.image=flink:1.11.1

集群应该是启动成功,启动的部分日志如下所示:
2020-09-10 14:08:56,417 INFO 
org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create
flink session cluster flink-cluster successfully, JobManager Web Interface:
http://10.10.102.20:30398

可以通过上述网址成功访问到UI界面。

接下来是我的两个问题:
1、关于提交的命令:
     仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session
-Dkubernetes.cluster-id=flink-cluster
examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.deployment.ClusterRetrieveException: Could not get
the rest endpoint of flink-cluster

     而后我仿照非native的方法,以如下命令提交:
     ./bin/flink run -m 10.10.101.26:30398
./examples/streaming/WordCount.jar,似乎提交成功了。
     Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗?
2、以第二种命令提交后报错。
     以第二种命令提交后,有如下报错:
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources.
        at
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
        ... 47 more
Caused by: java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        ... 27 more
Caused by: java.util.concurrent.TimeoutException

Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢?




--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复