Thanks Till, the DEBUG log level is a good idea. I figured it out. I made a mistake with `-` and `_`.
On Tue, Aug 22, 2017 at 1:39 AM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Hao Sun, > > have you checked that one can resolve the hostname flink_jobmanager from > within the container? This is required to connect to the JobManager. If > this is the case, then log files with DEBUG log level would be helpful to > track down the problem. > > Cheers, > Till > > On Wed, Aug 16, 2017 at 5:35 AM, Hao Sun <ha...@zendesk.com> wrote: > >> Hi, >> >> I am trying to run a cluster of job-manager and task-manager in docker. >> One of each for now. I got a StandaloneResourceManager error, stating >> that it can not associate with job-manager. I do not know what was wrong. >> >> I am sure that job-manager can be connected. >> =============================== >> root@flink-jobmanager:/opt/flink# telnet flink_jobmanager 32929 >> Trying 172.18.0.3... >> Connected to flink-jobmanager. >> Escape character is '^]'. >> Connection closed by foreign host. >> =============================== >> >> Here is my config: >> =============================== >> Starting Job Manager >> config file: >> jobmanager.rpc.address: flink_jobmanager >> jobmanager.rpc.port: 6123 >> jobmanager.web.port: 8081 >> jobmanager.heap.mb: 1024 >> taskmanager.heap.mb: 1024 >> taskmanager.numberOfTaskSlots: 1 >> taskmanager.memory.preallocate: false >> parallelism.default: 1 >> jobmanager.archive.fs.dir: file:///flink_data/completed-jobs/ >> historyserver.archive.fs.dir: file:///flink_data/completed-jobs/ >> state.backend: rocksdb >> state.backend.fs.checkpointdir: file:///flink_data/checkpoints >> taskmanager.tmp.dirs: /flink_data/tmp >> blob.storage.directory: /flink_data/tmp >> jobmanager.web.tmpdir: /flink_data/tmp >> env.log.dir: /flink_data/logs >> high-availability: zookeeper >> high-availability.storageDir: file:///flink_data/ha/ >> high-availability.zookeeper.quorum: kafka:2181 >> blob.server.port: 6124 >> query.server.port: 6125 >> =============================== >> >> Here is the major error I see: >> =============================== >> 2017-08-16 02:46:23,586 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Starting ZooKeeperLeaderRetrievalService. >> 2017-08-16 02:46:23,612 INFO >> org.apache.flink.runtime.jobmanager.JobManager - JobManager >> akka.tcp://flink@flink_jobmanager:32929/user/jobmanager was granted >> leadership with leader session ID >> Some(06abc8f5-c1b9-44b2-bb7f-771c74981552). >> 2017-08-16 02:46:23,627 INFO >> org.apache.flink.runtime.jobmanager.JobManager - Delaying recovery of all >> jobs by 10000 milliseconds. >> 2017-08-16 02:46:23,638 INFO >> org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader >> reachable under akka.tcp://flink@flink_jobmanager >> :32929/user/jobmanager:06abc8f5-c1b9-44b2-bb7f-771c74981552. >> 2017-08-16 02:46:23,640 INFO >> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager >> - Trying to associate with JobManager leader >> akka.tcp://flink@flink_jobmanager:32929/user/jobmanager >> 2017-08-16 02:46:23,653 WARN >> org.apache.flink.runtime.webmonitor.JobManagerRetriever - Failed to >> retrieve leader gateway and port. >> akka.actor.ActorNotFound: Actor not found for: >> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)] >> at >> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) >> at >> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) >> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) >> at >> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) >> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) >> at >> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) >> at >> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) >> at >> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) >> at >> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) >> at >> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280) >> at >> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270) >> at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63) >> at >> org.apache.flink.runtime.akka.AkkaUtils$.getActorRefFuture(AkkaUtils.scala:498) >> at >> org.apache.flink.runtime.akka.AkkaUtils.getActorRefFuture(AkkaUtils.scala) >> at >> org.apache.flink.runtime.webmonitor.JobManagerRetriever.notifyLeaderAddress(JobManagerRetriever.java:141) >> at >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.nodeChanged(ZooKeeperLeaderRetrievalService.java:168) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:310) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache$4.apply(NodeCache.java:304) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org>. >> apache.curator.shaded.com >> <http://apache.curator.shaded.com> >> .google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache.setNewData(NodeCache.java:302) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache.processBackgroundResult(NodeCache.java:269) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache.access$300(NodeCache.java:56) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.recipes.cache.NodeCache$3.processResult(NodeCache.java:122) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:749) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:522) >> at org.apache.flink.shaded.org >> <http://org.apache.flink.shaded.org> >> .apache.curator.framework.imps.GetDataBuilderImpl$3.processResult(GetDataBuilderImpl.java:257) >> at >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:561) >> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) >> 2017-08-16 02:46:33,644 INFO >> org.apache.flink.runtime.jobmanager.JobManager - Attempting to recover all >> jobs. >> 2017-08-16 02:46:33,648 INFO >> org.apache.flink.runtime.jobmanager.JobManager - There are no jobs to >> recover. >> =============================== >> >> More detailed log: >> https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d >> <https://gist.github.com/zenhao/19926402438f613c331ffe5b6e6e005d> >> > >