Hi Community, Yang,

I am new to Flink on native Kubernetes and I am trying to do a POC for native 
Kubernetes application mode on Oracle Cloud Infrastructure. I was following the 
documentation here step by step: [1]

I am using Flink 1.12.1, Scala 2.11, java 11.
I was able to create a native Kubernetes Deployment, but I am not able to use 
any further commands like list / cancel etc.. I always run into timeout error. 
I think the issue could be the JobManager Web Interface IP address printed 
after job deployment is not accessible. This issue is causing me not able to 
shut down the deployment with a savepoint. It could be Kubernetes configuration 
issue. I have exposed all related ports traffic and validated the security 
list, but still couldn’t make it work. Any help is appreciated.


The relevant Flink source code is CliFrontend.java class [2]
The ./bin/flink list and cancel command is trying to send traffic to the Flink 
dashboard UI IP address and it gets timeout. I tried to both LoadBalancer and 
NodePort option for -Dkubernetes.rest-service.exposed.type configuration. Both 
of them doesn’t work.

# List running job on the cluster (I can’t execute this command successfully 
due to timeout, logs shared below)
$ ./bin/flink list --target kubernetes-application 
-Dkubernetes.cluster-id=my-first-application-cluster
# Cancel running job (I can’t execute this command succcessfully)
$ ./bin/flink cancel --target kubernetes-application 
-Dkubernetes.cluster-id=my-first-application-cluster <jobId>

I think those commands needs to communicate with the endpoint that shows after 
the the job submission command.


  1.  Use case 1(deploy with NodePort)

# fuyli @ fuyli-mac in ~/Development/flink-1.12.1 [17:59:00] C:127
$ ./bin/flink run-application \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=my-first-application-cluster \
    
-Dkubernetes.container.image=us-phoenix-1.ocir.io/idxglh0bz964/flink-demo:21.3.1
 \
    -Dkubernetes.container.image.pull-policy=IfNotPresent \
    -Dkubernetes.container.image.pull-secrets=ocirsecret \
    -Dkubernetes.rest-service.exposed.type=NodePort \
    -Dkubernetes.service-account=flink-service-account \
local:///opt/flink/usrlib/quickstart-0.1.jar


When the expose type is NodePort, the printed messages says the the Flink  
JobManager Web Interface:is at http://192.29.104.156:30996  192.29.104.156 is 
my Kubernetes apiserver address. 30996 is the port that exposes the service. 
However, Flink dashboard in this address is not resolvable.
I can only get access to dashboard UI on each node IP address(There are three 
nodes in my K8S cluster)
100.104.154.73:30996
100.104.154.74:30996
100.104.154.75:30996
      I got the following errors when trying to do list command for such a 
native Kubernetes deployment. See in [4]. According to the documentation here 
[3], this shouldn’t happen since Kubernetes api server address should also have 
the Flink Web UI… Did I miss any configurations in Kubernetes to make webUI 
available in Kubernetes apiserver address?



  1.  Use case 2 (deploy with LoadBalancer)
# fuyli @ fuyli-mac in ~/Development/flink-1.12.1 [17:59:00] C:127
$ ./bin/flink run-application \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=my-first-application-cluster \
    
-Dkubernetes.container.image=us-phoenix-1.ocir.io/idxglh0bz964/flink-demo:21.3.1
 \
    -Dkubernetes.container.image.pull-policy=IfNotPresent \
    -Dkubernetes.container.image.pull-secrets=ocirsecret \
    -Dkubernetes.rest-service.exposed.type=LoadBalancer \
    -Dkubernetes.service-account=flink-service-account \
local:///opt/flink/usrlib/quickstart-0.1.jar


After a while, when the external IP is resolved. It said Flink JobManager web 
interface is at the external-IP (LOAD BALANCER address) at: 
http://144.25.13.78:8081
When I execute the list command, I still got error after waiting for long time 
to let it get timeout. See errors here. [5]

I can still get access to NodeIP:<service-port>. In such case, I tend to 
believe it is a network issue. But still quite confused since I am already open 
all the traffics..




Reference:
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html
[2] 
https://github.com/apache/flink/blob/f3155e6c0213de7bf4b58a89fb1e1331dee7701a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java
[3] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html#accessing-flinks-web-ui
[4] https://pastebin.ubuntu.com/p/WcJMwds52r/
[5] https://pastebin.ubuntu.com/p/m27BnQGXQc/


Thanks for your help in advance.

Best regards,
Fuyao


Reply via email to