I am trying to install a Flink HA cluster (Zookeeper mode) but the task manager cannot find the job manager.
Here I give you the architecture; - Machine 1 : Job Manager + Zookeeper - Machine 2 : Task Manager masters: Machine1 slaves : Machine2 flink-conf.yaml: #jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 blob.server.port: 50100-50200 taskmanager.data.port: 6121 high-availability: zookeeper high-availability.zookeeper.quorum: Machine1:2181 high-availability.zookeeper.path.root: /flink-1.5.1 high-availability.cluster-id: /default_b high-availability.storageDir: file:///shareflink/recovery Here this is the log of Task Manager, it tries to connect to localhost instead of Machine1: 2018-08-17 10:46:44,875 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager. 2018-08-17 10:46:44,876 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics 2018-08-17 10:46:44,966 INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address /127.0.0.1:37133. 2018-08-17 10:46:45,324 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /127.0.0.1:37133 2018-08-17 10:46:45,325 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'Machine2/IP-Machine2': Connection refused 2018-08-17 10:46:45,325 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Connection refused 2018-08-17 10:46:45,325 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/IP_Machine2': Connection refused 2018-08-17 10:46:45,325 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Connection refused 2018-08-17 10:46:45,326 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/IP_Machine2': Connection refused 2018-08-17 10:46:45,326 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Connection refused 2018-08-17 10:46:45,726 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /127.0.0.1:37133 2018-08-17 10:46:45,727 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'Machine2/IP-Machine2 2018-08-17 10:47:22,022 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@127.0.0.1:36515] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:36515]] Caused by: [Connection refused: /127.0.0.1:36515] 2018-08-17 10:47:22,022 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@127.0.0.1:36515/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@127.0.0.1:36515/user/resourcemanager.. 2018-08-17 10:47:32,037 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:36515 PS. : **/etc/hosts** contains the **localhost, Machine1 and Machine2** Can you please tell me how the Task Manager can connect to Job Manager ? Regards -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/