flink:latest container on kubernetes fails to connect taskmanager to jobmanager

jwatte Mon, 01 Oct 2018 13:36:02 -0700

I'm using the standard Kubernetes deploy configs for jobmanager and
taskmanager deployments, and jobmanager service.
However, when the task managers start up, they try to register with the job
manager over Akka on port 6123.
This fails, because the Akka on the jobmanager discards those messages as
"non-local."


The taskmanager keeps repeating this log message and eventually existing
(and getting restarted by Kubernetes):

2018-10-01 20:08:28,365 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in
10000 ms: Ask timed out on
[ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:6123/),
Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of
type "akka.actor.Identify"..

The jobmanager responds with this log message:

2018-10-01 20:09:38,475 ERROR akka.remote.EndpointWriter                        
           
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@flink-jobmanager:6123/]] arriving at
[akka.tcp://flink@flink-jobmanager:6123] inbound addresses are
[akka.tcp://flink@cluster:6123]

I have verified that network connectivity exists, so this is some
configuration problem.
I notice that the docker-entrypoint.sh edits the config files and calls the
taskmanager.sh / jobmanager.sh scripts based on start mode.
Is this file editing the config file wrong? What needs to be done so that
Akka on the jobmanager accepts the registration messages?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

flink:latest container on kubernetes fails to connect taskmanager to jobmanager

Reply via email to