Hi Jan,

Correct. As I said before it's not common or recommended practice to run an even number, and I wouldn't recommend it myself. I hope it didn't sound as if I did.

However, I don't see how this would cause the issue at hand unless at least 3 out of the 6 zookeepers died, but that could also have happened in a 5 node setup.

In either case, changing the number of zookeepers is not a prerequisite to progress debugging this issue further.

Cheers,

Michal


On 30/04/17 13:35, jan wrote:
I looked this up yesterday  when I read the grandparent, as my old
company ran two and I needed to know.
Your link is a bit ambiguous but it has a link to the zookeeper
Getting Started guide which says this:

"
For replicated mode, a minimum of three servers are required, and it
is strongly recommended that you have an odd number of servers. If you
only have two servers, then you are in a situation where if one of
them fails, there are not enough machines to form a majority quorum.
Two servers is inherently less stable than a single server, because
there are two single points of failure.
"

<https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html>

cheers

jan


On 30/04/2017, Michal Borowiecki <michal.borowie...@openbet.com> wrote:
Svante, I don't share your opinion.
Having an even number of zookeepers is not a problem in itself, it
simply means you don't get any better resilience than if you had one
fewer instance.
Yes, it's not common or recommended practice, but you are allowed to
have an even number of zookeepers and it's most likely not related to
the problem at hand and does NOT need to be addressed first.
https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup

Abhit, I'm afraid the log snippet is not enough for me to help.
Maybe someone else in the community with more experience can recognize
the symptoms but in the meantime, if you haven't already done so, you
may want to search for similar issues:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22

searching for text like "ZK expired; shut down all controller" or "No
broker in ISR is alive for" or other interesting events form the log.

Hope that helps,
Michal


On 26/04/17 21:40, Svante Karlsson wrote:
You are not supposed to run an even number of zookeepers. Fix that first

On Apr 26, 2017 20:59, "Abhit Kalsotra" <abhit...@gmail.com> wrote:

Any pointers please....


Abhi

On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra <abhit...@gmail.com>
wrote:

Hi *

My kafka setup


**OS: Windows Machine*6 broker nodes , 4 on one Machine and 2 on other
Machine*

**ZK instance on (4 broker nodes Machine) and another ZK on (2 broker
nodes machine)*
** 2 Topics with partition size = 50 and replication factor = 3*

I am producing on an average of around 500 messages / sec with each
message size close to 98 bytes...

More or less the message rate stays constant throughout, but after
running
the setup for close to 2 weeks , my Kafka cluster broke and this
happened
twice in a month.  Not able to understand what's the issue, Kafka gurus
please do share your inputs...

the controlle.log file at the time of Kafka broken looks like




*[2017-04-26 12:03:34,998] INFO [Controller 0]: Broker failure callback
for 0,1,3,5,6 (kafka.controller.KafkaController)[2017-04-26
12:03:34,998]
INFO [Controller 0]: Removed ArrayBuffer() from list of shutting down
brokers. (kafka.controller.KafkaController)[2017-04-26 12:03:34,998]
INFO
[Partition state machine on Controller 0]: Invoking state change to
OfflinePartition for partitions
[__consumer_offsets,19],[mytopic,11],[__consumer_
offsets,30],[mytopicOLD,18],[mytopic,13],[__consumer_
offsets,47],[mytopicOLD,26],[__consumer_offsets,29],[
mytopicOLD,0],[__consumer_offsets,41],[mytopic,44],[
mytopicOLD,38],[mytopicOLD,2],[__consumer_offsets,17],[__
consumer_offsets,10],[mytopic,20],[mytopic,23],[mytopic,30],
[__consumer_offsets,14],[__consumer_offsets,40],[mytopic,
31],[mytopicOLD,43],[mytopicOLD,19],[mytopicOLD,35]
,[__consumer_offsets,18],[mytopic,43],[__consumer_offsets,26],[__consumer_
offsets,0],[mytopic,32],[__consumer_offsets,24],[
mytopicOLD,3],[mytopic,2],[mytopic,3],[mytopicOLD,45],[
mytopic,35],[__consumer_offsets,20],[mytopic,1],[
mytopicOLD,33],[__consumer_offsets,5],[mytopicOLD,47],[__
consumer_offsets,22],[mytopicOLD,8],[mytopic,33],[
mytopic,36],[mytopicOLD,11],[mytopic,47],[mytopicOLD,20],[
mytopic,48],[__consumer_offsets,12],[mytopicOLD,32],[_
_consumer_offsets,8],[mytopicOLD,39],[mytopicOLD,27]
,[mytopicOLD,49],[mytopicOLD,42],[mytopic,21],[mytopicOLD,
31],[mytopic,29],[__consumer_offsets,23],[mytopicOLD,21],[_
_consumer_offsets,48],[__consumer_offsets,11],[mytopic,
18],[__consumer_offsets,13],[mytopic,45],[mytopic,5],[
mytopicOLD,25],[mytopic,6],[mytopicOLD,23],[mytopicOLD,37]
,[__consumer_offsets,6],[__consumer_offsets,49],[
mytopicOLD,13],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_
offsets,37],[mytopic,12],[mytopicOLD,30],[__consumer_
offsets,31],[__consumer_offsets,44],[mytopicOLD,15],[
mytopicOLD,29],[mytopic,37],[mytopic,38],[__consumer_
offsets,42],[mytopic,27],[mytopic,26],[mytopic,15],[__
consumer_offsets,34],[mytopic,42],[__consumer_offsets,46],[
mytopic,14],[mytopicOLD,12],[mytopicOLD,1],[mytopic,7],[__
consumer_offsets,25],[mytopicOLD,24],[mytopicOLD,44]
,[mytopicOLD,14],[__consumer_offsets,32],[mytopic,0],[__
consumer_offsets,43],[mytopic,39],[mytopicOLD,5],[mytopic,9]
,[mytopic,24],[__consumer_offsets,36],[mytopic,25],[
mytopicOLD,36],[mytopic,19],[__consumer_offsets,35],[__
consumer_offsets,7],[mytopic,8],[__consumer_offsets,38],[
mytopicOLD,48],[mytopicOLD,9],[__consumer_offsets,1],[
mytopicOLD,6],[mytopic,41],[mytopicOLD,41],[mytopicOLD,7],
[mytopic,17],[mytopicOLD,17],[mytopic,49],[__consumer_
offsets,16],[__consumer_offsets,2]
(kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,045] INFO
[SessionExpirationListener on 1], ZK expired; shut down all controller
components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)[2017-04-26
12:03:35,045] DEBUG [Controller 1]: Controller resigning, broker id 1
(kafka.controller.KafkaController)[2017-04-26 12:03:35,045] DEBUG
[Controller 1]: De-registering IsrChangeNotificationListener
(kafka.controller.KafkaController)[2017-04-26 12:03:35,060] INFO
[Partition
state machine on Controller 1]: Stopped partition state machine
(kafka.controller.PartitionStateMachine)[2017-04-26 12:03:35,060] INFO
[Replica state machine on controller 1]: Stopped replica state machine
(kafka.controller.ReplicaStateMachine)[2017-04-26 12:03:35,060] INFO
[Controller 1]: Broker 1 resigned as the controller
(kafka.controller.KafkaController)[2017-04-26 12:03:36,013] DEBUG
[OfflinePartitionLeaderSelector]: No broker in ISR is alive for
[__consumer_offsets,19]. Pick the leader from the alive assigned
replicas:
(kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26
12:03:36,029]
DEBUG [OfflinePartitionLeaderSelector]:
[mytopic,11]. Pick the leader from the alive assigned replicas:
(kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26
12:03:36,029]
DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for
[__consumer_offsets,30]. Pick the leader from the alive assigned
replicas:
(kafka.controller.OfflinePartitionLeaderSelector)[2017-04-26
12:03:37,811]
DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for
[mytopicOLD,18]. Select 2 from ISR 2 to be the leader.
(kafka.controller.OfflinePartitionLeaderSelector)*

Typical broker config attached.. Please do share some valid inputs...

Abhi
!wq


*-- *
If you can't succeed, call it version 1.0


--
If you can't succeed, call it version 1.0

--
Signature
<http://www.openbet.com/>         Michal Borowiecki
Senior Software Engineer L4
        T:      +44 208 742 1600

        
        +44 203 249 8448

        
        
        E:      michal.borowie...@openbet.com
        W:      www.openbet.com <http://www.openbet.com/>

        
        OpenBet Ltd

        Chiswick Park Building 9

        566 Chiswick High Rd

        London

        W4 5XT

        UK

        
<https://www.openbet.com/email_promo>

This message is confidential and intended only for the addressee. If you
have received this message in error, please immediately notify the
postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
from your system as well as any copies. The content of e-mails as well
as traffic data may be monitored by OpenBet for employment and security
purposes. To protect the environment please do not print this e-mail
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
registered in England and Wales. Registered no. 3134634. VAT no.
GB927523612



--
Signature
<http://www.openbet.com/>         Michal Borowiecki
Senior Software Engineer L4
        T:      +44 208 742 1600

        
        +44 203 249 8448

        
        
        E:      michal.borowie...@openbet.com
        W:      www.openbet.com <http://www.openbet.com/>

        
        OpenBet Ltd

        Chiswick Park Building 9

        566 Chiswick High Rd

        London

        W4 5XT

        UK

        
<https://www.openbet.com/email_promo>

This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612

Reply via email to