Hi, AFAIK, the JobGraph itself is not stored in ZK but in HDFS. ZK only stores a handle to the serialised JobGraph.
Best, Aljoscha > On 15. Feb 2018, at 04:59, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: > > Thanks a lot Aljoscha. > > I was doing a silly mistake. TaskManagers can now register with JobManager. > > One more thing, does Flink now store Job Graphs on ZK too? > > Regards, > > Chirag > > On Wednesday, 14 February, 2018, 8:06:14 PM IST, Aljoscha Krettek > <aljos...@apache.org> wrote: > > > It should be roughly the same settings that you use in your JobManager. They > are described here: > https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode > > <https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode> > >> On 14. Feb 2018, at 15:32, Chirag Dewan <chirag.dewa...@yahoo.in >> <mailto:chirag.dewa...@yahoo.in>> wrote: >> >> Thanks Aljoscha. >> >> I haven't checked that bit. Is there any configuration for TaskManagers to >> find ZK? >> >> Regards, >> >> Chirag >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek >> <aljos...@apache.org <mailto:aljos...@apache.org>> wrote: >> Do you see in the logs whether the TaskManager correctly connect to >> ZooKeeper as well? They need this in order to find the JobManager leader. >> >> Best, >> Aljoscha >> >>> On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in >>> <mailto:chirag.dewa...@yahoo.in>> wrote: >>> >>> Hi, >>> >>> I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For >>> JobManager HA, I have started a 3 node zookeeper service on the same swarm >>> network and configured Flink's zookeeper quorum with zookeeper service >>> instances. >>> >>> JobManager gets started with the LeaderElectionService and gets assigned a >>> LeaderSessionID too, which I can see from the following log >>> statements(attaching only related logs) : >>> >>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - >>> Starting ZooKeeperLeaderElectionService >>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >>> Starting ZooKeeperLeaderRetrievalService. >>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >>> Starting ZooKeeperLeaderRetrievalService. >>> JobManager akka.tcp://flink@jobmanager:6123/user/jobmanager <> was granted >>> leadership with leader session ID >>> Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f). >>> Trying to associate with JobManager leader >>> akka.tcp://flink@jobmanager:6123/user/jobmanager <> >>> Resource Manager associating with leading JobManager >>> Actor[akka://flink/user/jobmanager#590681231 <>] - leader session >>> 1f3b2ec6-77b6-4532-928f-ad8befd5202f >>> >>> >>> But TaskManagers are not able to register with the JobManager and gives the >>> following error: >>> >>> Discard message >>> LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3 >>> @ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, >>> heap=536870912, managed=324208384,1)) because the expected leader session >>> ID 1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal the received leader >>> session ID 00000000-0000-0000-0000-000000000000. >>> >>> Seems like the ResourceManager was not able to retrieve the LeaderSessionID >>> and passed 00 ID. >>> >>> One interesting thing I observed was a ZK version log: >>> >>> The version of ZooKeeper being used doesn't support Container nodes. >>> CreateMode.PERSISTENT will be used instead. >>> >>> Is this a ZK version problem? Should I be using ZK 3.4.6? >>> >>> My configuration: >>> >>> Flink Version : 1.4.0 >>> ZK version : 3.4.11 (I just pulled the latest image) >>> >>> Thanks in advance. >>> >>> Chirag >>> >> >