Hi,

I am trying to run a cluster of job-manager and task-manager in docker.
One of each for now. I got a StandaloneResourceManager error, stating that
it can not associate with job-manager. I do not know what was wrong.

I am sure that job-manager can be connected.
===============================
root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929
Trying 172.18.0.3...
Connected to flink-jobmanager.
Escape character is '^]'.
Connection closed by foreign host.
===============================

Here is my config:
===============================
Starting Job Manager
config file:
jobmanager.rpc.address: flink_jobmanager
jobmanager.rpc.port: 6123
jobmanager.web.port: 8081
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 1
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/
historyserver.archive.fs.dir: file:///flink_data/completed-jobs/
state.backend: rocksdb
state.backend.fs.checkpointdir: file:///flink_data/checkpoints
taskmanager.tmp.dirs: /flink_data/tmp
blob.storage.directory: /flink_data/tmp
jobmanager.web.tmpdir: /flink_data/tmp
env.log.dir: /flink_data/logs
high-availability: zookeeper
high-availability.storageDir: file:///flink_data/ha/
high-availability.zookeeper.quorum: kafka:2181
blob.server.port: 6124
query.server.port: 6125
===============================

Here is the major error I see:
===============================
2017-08-16 02:46:23,586 INFO
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService -
Starting ZooKeeperLeaderRetrievalService.
2017-08-16 02:46:23,612 INFO org.apache.flink.runtime.jobmanager.JobManager
- JobManager akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was
granted leadership with leader session ID
Some(06abc8f5-c1b9-44b2-bb7f-771c74981552).
2017-08-16 02:46:23,627 INFO org.apache.flink.runtime.jobmanager.JobManager
- Delaying recovery of all jobs by 10000 milliseconds.
2017-08-16 02:46:23,638 INFO
org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader
reachable under akka.tcp://flink@flink_jobmanager
:32929/user/jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552.
2017-08-16 02:46:23,640 INFO
org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
- Trying to associate with JobManager leader
akka.tcp://flink@flink_jobmanager:32929/user/jobmanager
2017-08-16 02:46:23,653 WARN
org.apache.flink.runtime.webmonitor.JobManagerRetriever - Failed to
retrieve leader gateway and port.
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at
scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
at
org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498)
at
org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala)
at
org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141)
at
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.nodeChanged(ZooKeeperLeaderRetrievalService.java:168)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304)
at
org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
at
org.apache.flink.shaded.org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
at
org.apache.flink.shaded.org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56)
at
org.apache.flink.shaded.org.apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122)
at
org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749)
at
org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522)
at
org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2017-08-16 02:46:33,644 INFO org.apache.flink.runtime.jobmanager.JobManager
- Attempting to recover all jobs.
2017-08-16 02:46:33,648 INFO org.apache.flink.runtime.jobmanager.JobManager
- There are no jobs to recover.
===============================

More detailed log:
https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d

Reply via email to