The root cause might be the LoadBalancer could not really work in your
environment. We already have a ticket to track this[1] and will try to get
it resolved in the next release.

For now, could you please have a try by adding
"-Dkubernetes.rest-service.exposed.type=NodePort" to your session and
submission commands?

Maybe you are also interested in the new flink-kubernetes-operator
project[2]. It should make it easier to run a Flink application on the K8s.

[1]. https://issues.apache.org/jira/browse/FLINK-17231
[2]. https://github.com/apache/flink-kubernetes-operator

Best,
Yang

Burcu Gul POLAT EGRI <be...@sdt.com.tr> 于2022年3月25日周五 21:39写道:

> I am getting the following error when I try to execute sample at Flink
> documentation - Native Kubernetes
> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/native_kubernetes/>
> .
>
> I have succedded to execute the first command in documentation by adding
> some extra parameters with the help of this post
> <https://cloudolife.com/2020/12/12/Cloud-Native/BIg-Data/Flink/Deploy-a-Apache-Flink-session-cluster-natively-on-Kubernetes-K8S/>
> .
>
> user@local:~/flink-1.14.4$ ./bin/kubernetes-session.sh \
>
> -Dkubernetes.cluster-id=dproc-example-flink-cluster-id \
>
> -Dtaskmanager.memory.process.size=4096m \
>
> -Dkubernetes.taskmanager.cpu=2 \
>
> -Dtaskmanager.numberOfTaskSlots=4 \
>
> -Dresourcemanager.taskmanager-timeout=3600000 \
>
> -Dkubernetes.namespace=sdt-dproc-flink-test \
>
> -Dkubernetes.config.file=/home/devuser/.kube/config \
>
> -Dkubernetes.jobmanager.service-account=flink-service-account
>
> After executing above command, I have listed the new pod like below.
>
> user@local:~/flink-1.14.4$ kubectl get pods
>
> NAME                                             READY   STATUS    RESTARTS   
> AGE
>
> dproc-example-flink-cluster-id-68c79bf67-mwh52   1/1     Running   0          
> 1m
>
> Then, I have executed the below command to submit example job.
>
> user@local:~/flink-1.14.4$ ./bin/flink run --target kubernetes-session \
>
> -Dkubernetes.service-account=flink-service-account \
>
> -Dkubernetes.cluster-id=dproc-example-flink-cluster-id \
>
> -Dkubernetes.namespace=sdt-dproc-flink-test \
>
> -Dkubernetes.config.file=/home/devuser/.kube/config
>
> examples/batch/WordCount.jar --input /home/user/sometexts.txt --output 
> /tmp/flinksample
>
> After a while, I received below logs:
>
> 2022-03-25 12:38:00,538 INFO  
> org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Retrieve 
> flink cluster dproc-example-flink-cluster-id successfully, JobManager Web 
> Interface: http://10.150.140.248:8081
>
>
>
> ------------------------------------------------------------
>
>  The program finished with the following exception:
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>
>     at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>
>     at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
>
>     at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
>
>     at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
>
>     at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
>
>     at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>
>     at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>
>     at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>
>     at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316)
>
>     at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
>
>     at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:131)
>
>     at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
>
>     at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93)
>
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>
>     at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>
>     ... 8 more
>
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
>
>     at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
>
>     ... 16 more
>
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>
>     at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$11(RestClusterClient.java:433)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:399)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>
>     at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:476)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:262)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
>     at java.base/java.lang.Thread.run(Thread.java:829)
>
> Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could 
> not complete the operation. Number of retries has been exhausted.
>
>     at 
> org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395)
>
>     ... 21 more
>
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.150.140.248:8081
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063)
>
>     ... 19 more
>
> Caused by: 
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: /10.150.140.248:8081
>
>     at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261)
>
>     ... 8 more
>
> I understand from the last part of this error that the JobManager Web
> Interface URL is wrong because when I check the Kubernetes services, port
> is different.
>
> user@local:~/flink-1.14.4$ kubectl get svc
>
> NAME                                  TYPE           CLUSTER-IP      
> EXTERNAL-IP   PORT(S)             AGE
>
> dproc-example-flink-cluster-id        ClusterIP      None            <none>   
>      6123/TCP,6124/TCP   6h32m
>
> dproc-example-flink-cluster-id-rest   LoadBalancer   10.97.100.197   
> <pending>     8081:30976/TCP      6h32m
>
> The port should be 30976 rather that 8081. I have already tried to edit
> rest.port in flink-conf.yaml with this value and also as parameter from
> command line. But nothing changed. Always I get this error.
>
> How can I force Flink client to access correct JobManager URL.
>
>
>
> *Burcu *
>
>
> Bu e-posta ve içeriği kişiye özel ve gizli bilgiler içerebilir. Eğer
> mesajın muhatabı veya muhataba iletmekle yükümlü yetkili temsilcisi siz
> değilseniz, bu mesajı çoğaltmak, dağıtmak, açıklamak dahil olmak üzere
> herhangi bir suretle kullanmamanız gerektiğini, aksine davranışınızın
> hukuka aykırılık teşkil edebileceğini bildiririz. Eğer bu mesajı
> yanlışlıkla aldıysanız, lütfen göndericiye e-posta ile bildirerek siliniz.
> Bu mesajda belirtilen şahsi görüşler göndericiye aittir ve SDT A.Ş.’nin
> resmi görüşünü temsil etmeyebilir.
>
> This email and its contents may contain information that is privileged and
> confidential. If you are not an intended recipient,or the agent responsible
> for delivering this email to the intended recipient, you are hereby
> notified that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited and may be unlawful. If you received
> this email in error, please notify the sender by replying to this email and
> delete the email sent in error. Personel opinions presented in this e-mail
> message are solely those of the author and do not necessarily represent SDT
> A.S.`s formal and authorized views.
>

Reply via email to