Ah, yes, you're right. I miss-read it. My bad. Apologies.
Michal On 30/04/17 16:02, Svante Karlsson wrote:
@michalMy interpretation is that he's running 2 instances of zookeeper - not 6. (1 on the "4 broker machine" and one on the other)I'm not sure where that leaves you in zookeeper land - ie if you happen to have a timeout between the two zookeepers will you be out of service or will you have a split brain problem? None of the alternatives are good. That said - it should be visible in the logs.Anyway two zk is not a good config - stick to one or go to three.2017-04-30 15:41 GMT+02:00 Michal Borowiecki <michal.borowie...@openbet.com <mailto:michal.borowie...@openbet.com>>:Hi Jan, Correct. As I said before it's not common or recommended practice to run an even number, and I wouldn't recommend it myself. I hope it didn't sound as if I did. However, I don't see how this would cause the issue at hand unless at least 3 out of the 6 zookeepers died, but that could also have happened in a 5 node setup. In either case, changing the number of zookeepers is not a prerequisite to progress debugging this issue further. Cheers, Michal On 30/04/17 13:35, jan wrote:-- <http://www.openbet.com/> Michal BorowieckiI looked this up yesterday when I read the grandparent, as my old company ran two and I needed to know. Your link is a bit ambiguous but it has a link to the zookeeper Getting Started guide which says this: " For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers is inherently less stable than a single server, because there are two single points of failure. " <https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html> <https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html> cheers jan On 30/04/2017, Michal Borowiecki<michal.borowie...@openbet.com> <mailto:michal.borowie...@openbet.com> wrote:Svante, I don't share your opinion. Having an even number of zookeepers is not a problem in itself, it simply means you don't get any better resilience than if you had one fewer instance. Yes, it's not common or recommended practice, but you are allowed to have an even number of zookeepers and it's most likely not related to the problem at hand and does NOT need to be addressed first. https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup <https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup> Abhit, I'm afraid the log snippet is not enough for me to help. Maybe someone else in the community with more experience can recognize the symptoms but in the meantime, if you haven't already done so, you may want to search for similar issues: https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22 <https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20%7E%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22> searching for text like "ZK expired; shut down all controller" or "No broker in ISR is alive for" or other interesting events form the log. Hope that helps, Michal On 26/04/17 21:40, Svante Karlsson wrote:You are not supposed to run an even number of zookeepers. Fix that first On Apr 26, 2017 20:59, "Abhit Kalsotra"<abhit...@gmail.com> <mailto:abhit...@gmail.com> wrote:Any pointers please.... Abhi On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra<abhit...@gmail.com> <mailto:abhit...@gmail.com> wrote:Hi * My kafka setup **OS: Windows Machine*6 broker nodes , 4 on one Machine and 2 on other Machine* **ZK instance on (4 broker nodes Machine) and another ZK on (2 broker nodes machine)* ** 2 Topics with partition size = 50 and replication factor = 3* I am producing on an average of around 500 messages / sec with each message size close to 98 bytes... More or less the message rate stays constant throughout, but afterrunningthe setup for close to 2 weeks , my Kafka cluster broke and this happened twice in a month. Not able to understand what's the issue, Kafka gurus please do share your inputs... the controlle.log file at the time of Kafka broken looks like *[2017-04-26 12:03:34,998] INFO [Controller 0]: Broker failure callback for 0,1,3,5,6 (kafka.controller.KafkaController)[2017-04-2612:03:34,998]INFO [Controller 0]: Removed ArrayBuffer() from list of shutting down brokers. (kafka.controller.KafkaController)[2017-04-26 12:03:34,998]INFO[Partition state machine on Controller 0]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[mytopic,11],[__consumer_offsets,30],[mytopicOLD,18],[mytopic,13],[__consumer_ offsets,47],[mytopicOLD,26],[__consumer_offsets,29],[ mytopicOLD,0],[__consumer_offsets,41],[mytopic,44],[ mytopicOLD,38],[mytopicOLD,2],[__consumer_offsets,17],[__ consumer_offsets,10],[mytopic,20],[mytopic,23],[mytopic,30], [__consumer_offsets,14],[__consumer_offsets,40],[mytopic, 31],[mytopicOLD,43],[mytopicOLD,19],[mytopicOLD,35] ,[__consumer_offsets,18],[mytopic,43],[__consumer_offsets,26],[__consumer_ offsets,0],[mytopic,32],[__consumer_offsets,24],[ mytopicOLD,3],[mytopic,2],[mytopic,3],[mytopicOLD,45],[ mytopic,35],[__consumer_offsets,20],[mytopic,1],[ mytopicOLD,33],[__consumer_offsets,5],[mytopicOLD,47],[__ consumer_offsets,22],[mytopicOLD,8],[mytopic,33],[ mytopic,36],[mytopicOLD,11],[mytopic,47],[mytopicOLD,20],[ mytopic,48],[__consumer_offsets,12],[mytopicOLD,32],[_ _consumer_offsets,8],[mytopicOLD,39],[mytopicOLD,27] ,[mytopicOLD,49],[mytopicOLD,42],[mytopic,21],[mytopicOLD, 31],[mytopic,29],[__consumer_offsets,23],[mytopicOLD,21],[_ _consumer_offsets,48],[__consumer_offsets,11],[mytopic, 18],[__consumer_offsets,13],[mytopic,45],[mytopic,5],[ mytopicOLD,25],[mytopic,6],[mytopicOLD,23],[mytopicOLD,37] ,[__consumer_offsets,6],[__consumer_offsets,49],[ mytopicOLD,13],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_ offsets,37],[mytopic,12],[mytopicOLD,30],[__consumer_ offsets,31],[__consumer_offsets,44],[mytopicOLD,15],[ mytopicOLD,29],[mytopic,37],[mytopic,38],[__consumer_ offsets,42],[mytopic,27],[mytopic,26],[mytopic,15],[__ consumer_offsets,34],[mytopic,42],[__consumer_offsets,46],[ mytopic,14],[mytopicOLD,12],[mytopicOLD,1],[mytopic,7],[__ consumer_offsets,25],[mytopicOLD,24],[mytopicOLD,44] ,[mytopicOLD,14],[__consumer_offsets,32],[mytopic,0],[__ consumer_offsets,43],[mytopic,39],[mytopicOLD,5],[mytopic,9] ,[mytopic,24],[__consumer_offsets,36],[mytopic,25],[ mytopicOLD,36],[mytopic,19],[__consumer_offsets,35],[__ consumer_offsets,7],[mytopic,8],[__consumer_offsets,38],[ mytopicOLD,48],[mytopicOLD,9],[__consumer_offsets,1],[ mytopicOLD,6],[mytopic,41],[mytopicOLD,41],[mytopicOLD,7], [mytopic,17],[mytopicOLD,17],[mytopic,49],[__consumer_ offsets,16],[__consumer_offsets,2](kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,045] INFO [SessionExpirationListener on 1], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)[2017-04-26 12:03:35,045] DEBUG [Controller 1]: Controller resigning, broker id 1 (kafka.controller.KafkaController)[2017-04-26 12:03:35,045] DEBUG [Controller 1]: De-registering IsrChangeNotificationListener (kafka.controller.KafkaController)[2017-04-26 12:03:35,060] INFO[Partitionstate machine on Controller 1]: Stopped partition state machine (kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,060] INFO [Replica state machine on controller 1]: Stopped replica state machine (kafka.controller.ReplicaStateMachine)[2017-04-26 12:03:35,060] INFO [Controller 1]: Broker 1 resigned as the controller (kafka.controller.KafkaController)[2017-04-26 12:03:36,013] DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [__consumer_offsets,19]. Pick the leader from the alive assignedreplicas:(kafka.controller.OfflinePartitionLeaderSelector)[2017-04-2612:03:36,029]DEBUG [OfflinePartitionLeaderSelector]: [mytopic,11]. Pick the leader from the alive assigned replicas: (kafka.controller.OfflinePartitionLeaderSelector)[2017-04-2612:03:36,029]DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [__consumer_offsets,30]. Pick the leader from the alive assignedreplicas:(kafka.controller.OfflinePartitionLeaderSelector)[2017-04-2612:03:37,811]DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [mytopicOLD,18]. Select 2 from ISR 2 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)* Typical broker config attached.. Please do share some valid inputs... Abhi !wq *-- * If you can't succeed, call it version 1.0-- If you can't succeed, call it version 1.0-- Signature <http://www.openbet.com/> <http://www.openbet.com/> Michal Borowiecki Senior Software Engineer L4 T: +44 208 742 1600 <tel:+44%2020%208742%201600> +44 203 249 8448 <tel:+44%2020%203249%208448> E: michal.borowie...@openbet.com <mailto:michal.borowie...@openbet.com> W: www.openbet.com <http://www.openbet.com> <http://www.openbet.com/> <http://www.openbet.com/> OpenBet Ltd Chiswick Park Building 9 566 Chiswick High Rd London W4 5XT UK <https://www.openbet.com/email_promo> <https://www.openbet.com/email_promo> This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com <mailto:postmas...@openbet.com> <mailto:postmas...@openbet.com> <mailto:postmas...@openbet.com> and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612Senior Software Engineer L4 T: +44 208 742 1600 <tel:+44%2020%208742%201600> +44 203 249 8448 <tel:+44%2020%203249%208448> E: michal.borowie...@openbet.com <mailto:michal.borowie...@openbet.com> W: www.openbet.com <http://www.openbet.com/> OpenBet Ltd Chiswick Park Building 9 566 Chiswick High Rd London W4 5XT UK <https://www.openbet.com/email_promo> This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612
-- Signature <http://www.openbet.com/> Michal Borowiecki Senior Software Engineer L4 T: +44 208 742 1600 +44 203 249 8448 E: michal.borowie...@openbet.com W: www.openbet.com <http://www.openbet.com/> OpenBet Ltd Chiswick Park Building 9 566 Chiswick High Rd London W4 5XT UK <https://www.openbet.com/email_promo>This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612