Hi Fuyao,

Thanks for trying the native Kubernetes integration.

Just like you know, the Flink rest service could be exposed in following
three types, configured via "kubernetes.rest-service.exposed.type".

* ClusterIP, which means you could only access the Flink rest endpoint
inside the K8s cluster. Simply, users could start a Flink client in the
K8s cluster via the following yaml file. And use "kubectl exec" to tunnel
in the pod to create a Flink session/application cluster. Also the
"flink list/cancel" could work well.



















*apiVersion: apps/v1kind: Deploymentmetadata:  name: flink-clientspec:
replicas: 1  selector:    matchLabels:      app: flink-client  template:
metadata:      labels:        app: flink-client    spec:      containers:
    - name: client        image: flink:1.12.2        imagePullPolicy:
Always        args: ["sleep", "86400"]*

* NodePort
Currently, we have a limitation that only the Kubernetes master nodes could
be used to build the Flink exposed rest endpoint. So if your
APIServer node does not have the kube proxy, then the printed URL in the
Flink client logs could not be used. We already have a ticket[1] to
support one of the slave nodes for accessing the rest endpoint. But I have
not managed myself to get it done.

* LoadBalancer
Is the resolved rest endpoint "http://144.25.13.78:8081/"; accessible on
your Flink client side? If it is yes, then I think the Flink client
should be able to contact to JobManager rest server to list/cancel the
jobs. I have verified in Alibaba container service, and it works well.


[1]. https://issues.apache.org/jira/browse/FLINK-16601


Best,
Yang

Fuyao Li <fuyao...@oracle.com> 于2021年3月27日周六 上午5:59写道:

> Hi Community, Yang,
>
>
>
> I am new to Flink on native Kubernetes and I am trying to do a POC for
> native Kubernetes application mode on Oracle Cloud Infrastructure. I was
> following the documentation here step by step: [1]
>
>
>
> I am using Flink 1.12.1, Scala 2.11, java 11.
>
> I was able to create a native Kubernetes Deployment, but I am not able to
> use any further commands like list / cancel etc.. I always run into timeout
> error. I think the issue could be the JobManager Web Interface IP address
> printed after job deployment is not accessible. This issue is causing me
> not able to shut down the deployment with a savepoint. It could be
> Kubernetes configuration issue. I have exposed all related ports traffic
> and validated the security list, but still couldn’t make it work. Any help
> is appreciated.
>
>
>
>
>
> The relevant Flink source code is CliFrontend.java class [2]
>
> The ./bin/flink list and cancel command is trying to send traffic to the
> Flink dashboard UI IP address and it gets timeout. I tried to both
> LoadBalancer and NodePort option for
> -Dkubernetes.rest-service.exposed.type configuration. Both of them
> doesn’t work.
>
>
>
> # List running job on the cluster (I can’t execute this command
> successfully due to timeout, logs shared below)
>
> $ ./bin/flink list --target kubernetes-application
> -Dkubernetes.cluster-id=my-first-application-cluster
>
> # Cancel running job (I can’t execute this command succcessfully)
>
> $ ./bin/flink cancel --target kubernetes-application
> -Dkubernetes.cluster-id=my-first-application-cluster <jobId>
>
>
>
> I think those commands needs to communicate with the endpoint that shows
> after the the job submission command.
>
>
>
>    1. Use case 1(deploy with NodePort)
>
>
>
> # fuyli @ fuyli-mac in ~/Development/flink-1.12.1 [17:59:00] C:127
>
> $ ./bin/flink run-application \
>
>     --target kubernetes-application \
>
>     -Dkubernetes.cluster-id=my-first-application-cluster \
>
>     -Dkubernetes.container.image=
> us-phoenix-1.ocir.io/idxglh0bz964/flink-demo:21.3.1 \
>
>     -Dkubernetes.container.image.pull-policy=IfNotPresent \
>
>     -Dkubernetes.container.image.pull-secrets=ocirsecret \
>
>     -Dkubernetes.rest-service.exposed.type=NodePort \
>
>     -Dkubernetes.service-account=flink-service-account \
>
> local:///opt/flink/usrlib/quickstart-0.1.jar
>
>
>
>
>
> When the expose type is NodePort, the printed messages says the the Flink
> JobManager Web Interface:is at http://192.29.104.156:30996
> 192.29.104.156 is my Kubernetes apiserver address. 30996 is the port that
> exposes the service. However, Flink dashboard in this address is not
> resolvable.
>
> I can only get access to dashboard UI on each node IP address(There are
> three nodes in my K8S cluster)
>
> 100.104.154.73:30996
>
> 100.104.154.74:30996
>
> 100.104.154.75:30996
>
>       I got the following errors when trying to do list command for such a
> native Kubernetes deployment. See in [4]. *According to the documentation
> here [3], this shouldn’t happen since Kubernetes api server address should
> also have the Flink Web UI… Did I miss any configurations in Kubernetes to
> make webUI available in Kubernetes apiserver address?*
>
>
>
>
>
>    1. Use case 2 (deploy with LoadBalancer)
>
> # fuyli @ fuyli-mac in ~/Development/flink-1.12.1 [17:59:00] C:127
>
> $ ./bin/flink run-application \
>
>     --target kubernetes-application \
>
>     -Dkubernetes.cluster-id=my-first-application-cluster \
>
>     -Dkubernetes.container.image=
> us-phoenix-1.ocir.io/idxglh0bz964/flink-demo:21.3.1 \
>
>     -Dkubernetes.container.image.pull-policy=IfNotPresent \
>
>     -Dkubernetes.container.image.pull-secrets=ocirsecret \
>
>     -Dkubernetes.rest-service.exposed.type=LoadBalancer \
>
>     -Dkubernetes.service-account=flink-service-account \
>
> local:///opt/flink/usrlib/quickstart-0.1.jar
>
>
>
>
>
> After a while, when the external IP is resolved. It said Flink JobManager
> web interface is at the external-IP (LOAD BALANCER address) at:
> http://144.25.13.78:8081
>
> When I execute the list command, I still got error after waiting for long
> time to let it get timeout. See errors here. [5]
>
>
>
> I can still get access to NodeIP:<service-port>. In such case, I tend to
> believe it is a network issue. But still quite confused since I am already
> open all the traffics..
>
>
>
>
>
>
>
>
>
> Reference:
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html
>
> [2]
> https://github.com/apache/flink/blob/f3155e6c0213de7bf4b58a89fb1e1331dee7701a/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java
>
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/native_kubernetes.html#accessing-flinks-web-ui
>
> [4] https://pastebin.ubuntu.com/p/WcJMwds52r/
>
> [5] https://pastebin.ubuntu.com/p/m27BnQGXQc/
>
>
>
>
>
> Thanks for your help in advance.
>
>
>
> Best regards,
>
> Fuyao
>
>
>
>
>

Reply via email to