@michal My interpretation is that he's running 2 instances of zookeeper - not 6. (1 on the "4 broker machine" and one on the other)
I'm not sure where that leaves you in zookeeper land - ie if you happen to have a timeout between the two zookeepers will you be out of service or will you have a split brain problem? None of the alternatives are good. That said - it should be visible in the logs. Anyway two zk is not a good config - stick to one or go to three. 2017-04-30 15:41 GMT+02:00 Michal Borowiecki <michal.borowie...@openbet.com> : > Hi Jan, > > Correct. As I said before it's not common or recommended practice to run > an even number, and I wouldn't recommend it myself. I hope it didn't sound > as if I did. > > However, I don't see how this would cause the issue at hand unless at > least 3 out of the 6 zookeepers died, but that could also have happened in > a 5 node setup. > > In either case, changing the number of zookeepers is not a prerequisite to > progress debugging this issue further. > > Cheers, > > Michal > > On 30/04/17 13:35, jan wrote: > > I looked this up yesterday when I read the grandparent, as my old > company ran two and I needed to know. > Your link is a bit ambiguous but it has a link to the zookeeper > Getting Started guide which says this: > > " > For replicated mode, a minimum of three servers are required, and it > is strongly recommended that you have an odd number of servers. If you > only have two servers, then you are in a situation where if one of > them fails, there are not enough machines to form a majority quorum. > Two servers is inherently less stable than a single server, because > there are two single points of failure. > " > <https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html> > <https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html> > > cheers > > jan > > > On 30/04/2017, Michal Borowiecki <michal.borowie...@openbet.com> > <michal.borowie...@openbet.com> wrote: > > Svante, I don't share your opinion. > Having an even number of zookeepers is not a problem in itself, it > simply means you don't get any better resilience than if you had one > fewer instance. > Yes, it's not common or recommended practice, but you are allowed to > have an even number of zookeepers and it's most likely not related to > the problem at hand and does NOT need to be addressed > first.https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup > > Abhit, I'm afraid the log snippet is not enough for me to help. > Maybe someone else in the community with more experience can recognize > the symptoms but in the meantime, if you haven't already done so, you > may want to search for similar issues: > https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22 > > searching for text like "ZK expired; shut down all controller" or "No > broker in ISR is alive for" or other interesting events form the log. > > Hope that helps, > Michal > > > On 26/04/17 21:40, Svante Karlsson wrote: > > You are not supposed to run an even number of zookeepers. Fix that first > > On Apr 26, 2017 20:59, "Abhit Kalsotra" <abhit...@gmail.com> > <abhit...@gmail.com> wrote: > > > Any pointers please.... > > > Abhi > > On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra <abhit...@gmail.com> > <abhit...@gmail.com> > wrote: > > > Hi * > > My kafka setup > > > **OS: Windows Machine*6 broker nodes , 4 on one Machine and 2 on other > Machine* > > **ZK instance on (4 broker nodes Machine) and another ZK on (2 broker > nodes machine)* > ** 2 Topics with partition size = 50 and replication factor = 3* > > I am producing on an average of around 500 messages / sec with each > message size close to 98 bytes... > > More or less the message rate stays constant throughout, but after > > running > > the setup for close to 2 weeks , my Kafka cluster broke and this > happened > twice in a month. Not able to understand what's the issue, Kafka gurus > please do share your inputs... > > the controlle.log file at the time of Kafka broken looks like > > > > > *[2017-04-26 12:03:34,998] INFO [Controller 0]: Broker failure callback > for 0,1,3,5,6 (kafka.controller.KafkaController)[2017-04-26 > > 12:03:34,998] > > INFO [Controller 0]: Removed ArrayBuffer() from list of shutting down > brokers. (kafka.controller.KafkaController)[2017-04-26 12:03:34,998] > > INFO > > [Partition state machine on Controller 0]: Invoking state change to > OfflinePartition for partitions > [__consumer_offsets,19],[mytopic,11],[__consumer_ > > offsets,30],[mytopicOLD,18],[mytopic,13],[__consumer_ > offsets,47],[mytopicOLD,26],[__consumer_offsets,29],[ > mytopicOLD,0],[__consumer_offsets,41],[mytopic,44],[ > mytopicOLD,38],[mytopicOLD,2],[__consumer_offsets,17],[__ > consumer_offsets,10],[mytopic,20],[mytopic,23],[mytopic,30], > [__consumer_offsets,14],[__consumer_offsets,40],[mytopic, > 31],[mytopicOLD,43],[mytopicOLD,19],[mytopicOLD,35] > ,[__consumer_offsets,18],[mytopic,43],[__consumer_offsets,26],[__consumer_ > offsets,0],[mytopic,32],[__consumer_offsets,24],[ > mytopicOLD,3],[mytopic,2],[mytopic,3],[mytopicOLD,45],[ > mytopic,35],[__consumer_offsets,20],[mytopic,1],[ > mytopicOLD,33],[__consumer_offsets,5],[mytopicOLD,47],[__ > consumer_offsets,22],[mytopicOLD,8],[mytopic,33],[ > mytopic,36],[mytopicOLD,11],[mytopic,47],[mytopicOLD,20],[ > mytopic,48],[__consumer_offsets,12],[mytopicOLD,32],[_ > _consumer_offsets,8],[mytopicOLD,39],[mytopicOLD,27] > ,[mytopicOLD,49],[mytopicOLD,42],[mytopic,21],[mytopicOLD, > 31],[mytopic,29],[__consumer_offsets,23],[mytopicOLD,21],[_ > _consumer_offsets,48],[__consumer_offsets,11],[mytopic, > 18],[__consumer_offsets,13],[mytopic,45],[mytopic,5],[ > mytopicOLD,25],[mytopic,6],[mytopicOLD,23],[mytopicOLD,37] > ,[__consumer_offsets,6],[__consumer_offsets,49],[ > mytopicOLD,13],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_ > offsets,37],[mytopic,12],[mytopicOLD,30],[__consumer_ > offsets,31],[__consumer_offsets,44],[mytopicOLD,15],[ > mytopicOLD,29],[mytopic,37],[mytopic,38],[__consumer_ > offsets,42],[mytopic,27],[mytopic,26],[mytopic,15],[__ > consumer_offsets,34],[mytopic,42],[__consumer_offsets,46],[ > mytopic,14],[mytopicOLD,12],[mytopicOLD,1],[mytopic,7],[__ > consumer_offsets,25],[mytopicOLD,24],[mytopicOLD,44] > ,[mytopicOLD,14],[__consumer_offsets,32],[mytopic,0],[__ > consumer_offsets,43],[mytopic,39],[mytopicOLD,5],[mytopic,9] > ,[mytopic,24],[__consumer_offsets,36],[mytopic,25],[ > mytopicOLD,36],[mytopic,19],[__consumer_offsets,35],[__ > consumer_offsets,7],[mytopic,8],[__consumer_offsets,38],[ > mytopicOLD,48],[mytopicOLD,9],[__consumer_offsets,1],[ > mytopicOLD,6],[mytopic,41],[mytopicOLD,41],[mytopicOLD,7], > [mytopic,17],[mytopicOLD,17],[mytopic,49],[__consumer_ > offsets,16],[__consumer_offsets,2] > > (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,045] INFO > [SessionExpirationListener on 1], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener)[2017-04-26 > 12:03:35,045] DEBUG [Controller 1]: Controller resigning, broker id 1 > (kafka.controller.KafkaController)[2017-04-26 12:03:35,045] DEBUG > [Controller 1]: De-registering IsrChangeNotificationListener > (kafka.controller.KafkaController)[2017-04-26 12:03:35,060] INFO > > [Partition > > state machine on Controller 1]: Stopped partition state machine > (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,060] INFO > [Replica state machine on controller 1]: Stopped replica state machine > (kafka.controller.ReplicaStateMachine)[2017-04-26 12:03:35,060] INFO > [Controller 1]: Broker 1 resigned as the controller > (kafka.controller.KafkaController)[2017-04-26 12:03:36,013] DEBUG > [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [__consumer_offsets,19]. Pick the leader from the alive assigned > > replicas: > > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 > > 12:03:36,029] > > DEBUG [OfflinePartitionLeaderSelector]: > [mytopic,11]. Pick the leader from the alive assigned replicas: > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 > > 12:03:36,029] > > DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for > [__consumer_offsets,30]. Pick the leader from the alive assigned > > replicas: > > (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26 > > 12:03:37,811] > > DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for > [mytopicOLD,18]. Select 2 from ISR 2 to be the leader. > (kafka.controller.OfflinePartitionLeaderSelector)* > > Typical broker config attached.. Please do share some valid inputs... > > Abhi > !wq > > > *-- * > If you can't succeed, call it version 1.0 > > > > -- > If you can't succeed, call it version 1.0 > > > -- > Signature<http://www.openbet.com/> <http://www.openbet.com/> Michal > Borowiecki > Senior Software Engineer L4 > T: +44 208 742 1600 <+44%2020%208742%201600> > > > +44 203 249 8448 <+44%2020%203249%208448> > > > > E: michal.borowie...@openbet.com > W: www.openbet.com <http://www.openbet.com/> > <http://www.openbet.com/> > > > OpenBet Ltd > > Chiswick Park Building 9 > > 566 Chiswick High Rd > > London > > W4 5XT > > UK > > <https://www.openbet.com/email_promo> > <https://www.openbet.com/email_promo> > > This message is confidential and intended only for the addressee. If you > have received this message in error, please immediately notify > thepostmas...@openbet.com <mailto:postmas...@openbet.com> > <postmas...@openbet.com> and delete it > from your system as well as any copies. The content of e-mails as well > as traffic data may be monitored by OpenBet for employment and security > purposes. To protect the environment please do not print this e-mail > unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building > 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company > registered in England and Wales. Registered no. 3134634. VAT no. > GB927523612 > > > > > -- > <http://www.openbet.com/> Michal Borowiecki > Senior Software Engineer L4 > T: +44 208 742 1600 <+44%2020%208742%201600> > > > +44 203 249 8448 <+44%2020%203249%208448> > > > > E: michal.borowie...@openbet.com > W: www.openbet.com > OpenBet Ltd > > Chiswick Park Building 9 > > 566 Chiswick High Rd > > London > > W4 5XT > > UK > <https://www.openbet.com/email_promo> > This message is confidential and intended only for the addressee. If you > have received this message in error, please immediately notify the > postmas...@openbet.com and delete it from your system as well as any > copies. The content of e-mails as well as traffic data may be monitored by > OpenBet for employment and security purposes. To protect the environment > please do not print this e-mail unless necessary. OpenBet Ltd. Registered > Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, > United Kingdom. A company registered in England and Wales. Registered no. > 3134634. VAT no. GB927523612 >