Hello,

We are launching Flink deployments using the Flink Kubernetes 
Operator<https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/>
 on a Kubernetes cluster with Istio and mTLS enabled.

We found that the TaskManager is unable to communicate with the JobManager on 
the jobmanager-rpc port:


2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor            
           [] - Association with remote system 
[akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123] has 
failed, address is now gated for [50] ms. Reason: [Association failed with 
[akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]] Caused 
by: [The remote system explicitly disassociated (reason unknown).]

The reason for the issue is that the JobManager service port definitions are 
not following the Istio guidelines 
https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/
 (see example below).

We believe a change to the default port definitions is needed but for now, is 
there an immediate action we can take to work around the issue? Perhaps 
overriding the default port definitions somehow?

Thanks.


flink-kubernetes-operator 1.0.0
Flink 1.14-java11
Kubernetes v1.19.5
Istio 1.7.6


# k get service inference-results-to-analytics-engine -o yaml
apiVersion: v1
kind: Service
metadata:
...
  labels:
    app: inference-results-to-analytics-engine
    type: flink-native-kubernetes
  name: inference-results-to-analytics-engine
spec:
  clusterIP: None
  ports:
  - name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol" 
property
    port: 6123
    protocol: TCP
    targetPort: 6123
  - name: blobserver # should start with "tcp-" or add "appProtocol" property
    port: 6124
    protocol: TCP
    targetPort: 6124
  selector:
    app: inference-results-to-analytics-engine
    component: jobmanager
    type: flink-native-kubernetes
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Reply via email to