This is a known issue, see
https://issues.apache.org/jira/browse/FLINK-11127.
I'm not aware of a workaround.
On 12.12.2018 14:07, Sergei Poganshev wrote:
When I to deploy Flink 1.7 job to Kubernetes, the job itself runs, but
upon visiting Flink UI I can see no metrics and there are WARN
messages in jobmanager's log:
[flink-metrics-14] WARN akka.remote.ReliableDeliverySupervisor
flink-metrics-akka.remote.default-remote-dispatcher-3 - Association
with remote system
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]
has failed, address is now gated for [50] ms. Reason: [Association
failed with
[akka.tcp://flink-metrics@adhoc-historical-taskmanager-d4b65dfd4-h5nrx:44491]]
Caused by: [adhoc-historical-taskmanager-d4b65dfd4-h5nrx: Name or
service not known]
Note: adhoc-historical-taskmanager-d4b65dfd4-h5nrx is a hostname of a
pod on which taskmanager is running.
So, jobmanager tries to resolve taskmanager's hostname (which probably
got to it from taskmanager itself) on a random port. How can this be
mitigated?