Hi, I am trying to run a cluster of job-manager and task-manager in docker. One of each for now. I got a StandaloneResourceManager error, stating that it can not associate with job-manager. I do not know what was wrong.
I am sure that job-manager can be connected. =============================== root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929 Trying 172.18.0.3... Connected to flink-jobmanager. Escape character is '^]'. Connection closed by foreign host. =============================== Here is my config: =============================== Starting Job Manager config file: jobmanager.rpc.address: flink_jobmanager jobmanager.rpc.port: 6123 jobmanager.web.port: 8081 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 1 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/ historyserver.archive.fs.dir: file:///flink_data/completed-jobs/ state.backend: rocksdb state.backend.fs.checkpointdir: file:///flink_data/checkpoints taskmanager.tmp.dirs: /flink_data/tmp blob.storage.directory: /flink_data/tmp jobmanager.web.tmpdir: /flink_data/tmp env.log.dir: /flink_data/logs high-availability: zookeeper high-availability.storageDir: file:///flink_data/ha/ high-availability.zookeeper.quorum: kafka:2181 blob.server.port: 6124 query.server.port: 6125 =============================== Here is the major error I see: =============================== 2017-08-16 02:46:23,586 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2017-08-16 02:46:23,612 INFO org.apache.flink.runtime.jobmanager.JobManager - JobManager akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was granted leadership with leader session ID Some(06abc8f5-c1b9-44b2-bb7f-771c74981552). 2017-08-16 02:46:23,627 INFO org.apache.flink.runtime.jobmanager.JobManager - Delaying recovery of all jobs by 10000 milliseconds. 2017-08-16 02:46:23,638 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@flink_jobmanager :32929/user/jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552. 2017-08-16 02:46:23,640 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Trying to associate with JobManager leader akka.tcp://flink@flink_jobmanager:32929/user/jobmanager 2017-08-16 02:46:23,653 WARN org.apache.flink.runtime.webmonitor.JobManagerRetriever - Failed to retrieve leader gateway and port. akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280) at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270) at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63) at org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498) at org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala) at org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141) at org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.nodeChanged(ZooKeeperLeaderRetrievalService.java:168) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304) at org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) at org.apache.flink.shaded.org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56) at org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122) at org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749) at org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522) at org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 2017-08-16 02:46:33,644 INFO org.apache.flink.runtime.jobmanager.JobManager - Attempting to recover all jobs. 2017-08-16 02:46:33,648 INFO org.apache.flink.runtime.jobmanager.JobManager - There are no jobs to recover. =============================== More detailed log: https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d