Re: Zookeeper won't form a quorum...

Steve Jerman Thu, 20 Feb 2020 07:17:16 -0800

Few points:

1) The containers all are running:


bash-4.3# java -version
openjdk version "1.8.0_111-internal"
OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

2) The containers are configured like this:
    environment:
      ZK_ID: 1
      ZK_CLUSTER: server.1=0.0.0.0:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 2
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 3
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=0.0.0.0:2888:3888

Reading below, I see that you suggest I should do the following:

    environment:
      ZK_ID: 1
     quorumListenOnAllIPs: true
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 2
     quorumListenOnAllIPs: true
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888
--
    environment:
      ZK_ID: 3
     quorumListenOnAllIPs: true
      ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 
server.3=zookeeper3:2888:3888

Will try...

Steve

On 2/20/20, 2:56 AM, "Jörn Franke" <[email protected]> wrote:

    Thanks . It is strange that JDK 11.0.6 has a backwards incompatible change. 
However, it would be sad if we are stuck all the time with JDK 11.0.5.
    
    
    > Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté 
<[email protected]>:
    > 
    > Hi Guys,
    > 
    > I think the 'reverse order startup failure' actually has the very same 
root
    > cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.
    > 
    > Downgrading to 3.4 for now should solve these problems I think.
    > 
    > Still I am a bit confused... I just want to understand if we really miss
    > something in the ZooKeeper configuration model.
    > 
    > Assuming that myid=1 (we are talking about the zoo.cfg in server 1), to 
the
    > 'server.1=...' line you can put address which can be used by other servers
    > to talk back to server 1. This will be the 'advertised address' used by
    > ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5 (until we 
fix
    > it with ZOOKEEPER-2164), as server 2 will not be able to use 0.0.0.0 to
    > talk back to server 1. But if you put a valid address to the 
'server.1=...'
    > config line while having quorumListenOnAllIPs=true set, you should still 
be
    > able tell ZooKeeper to bind on 0.0.0.0, no matter what IP/hostname you put
    > to the 'server.1=...' configs.
    > (there is a similar config to set which IP to bind with the client port as
    > well, if you would need to bind to 0.0.0.0 with the client port too.)
    > 
    > 
    > @Jorn:
    >> This might be a wide shot and I did not see exactly the same error, but
    > with corretto jdk 11.0.6 I had also issue that ZK could not a quorum. I
    > downgraded to 11.0.5 and it did not have an issues. This was on ZK 3.5.5
    > with Kerberos authentication and authorization.
    > 
    > In the recent JDK versions (8u424, or 11.0.6) there are some
    > backward-incompatible kerbersos related changes affecting basically the
    > whole Hadoop stack, not only ZooKeeper. I think it is not recommended to
    > use these JDK versions with Hadoop. I am not involved deep in this (maybe
    > there is some workaround already, but I am not aware of it).
    > 
    > Kind regards,
    > Mate
    > 
    >> On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman <[email protected]> 
wrote:
    >> 
    >> Ok,
    >> 
    >> Just to confirm. Rolling back to 3.4.14 fixes the issue.  The quorum
    >> starts up and restarting any of the instances works....
    >> 
    >> Are there any issues with using the 3.5 client with 3.4 server?
    >> 
    >> Steve
    >> 
    >> On 2/19/20, 9:02 AM, "Steve Jerman" <[email protected]> wrote:
    >> 
    >>    OK, that explains it.  I will see if 3.4.14 fixes the issue for the
    >> time being...
    >> 
    >>    Thanks
    >>    Steve
    >> 
    >>    On 2/19/20, 8:57 AM, "Jan Kosecki" <[email protected]> wrote:
    >> 
    >>        Hi Steve,
    >> 
    >>        it's possible that the quorum state depends on the order your
    >> nodes start.
    >>        In my kubernetes environment I've had a similar issue and I've
    >> noticed that
    >>        starting brokers 1 by 1, following the order from configuration
    >> file allows
    >>        all 3 to join the quorum but a reverse order would keep server
    >> started as
    >>        the last outside of the quorum. I was also using 0.0.0.0 in the
    >>        configuration and didn't try a full address due to readiness check
    >>        configuration.
    >> 
    >>        Unfortunately I didn't have time to debug it any further so I've
    >> downgraded
    >>        back to 3.4 for the time being.
    >> 
    >>        Hope you manage to find a solution,
    >> 
    >>        Best,
    >>        Jan
    >> 
    >>        On Wed, 19 Feb 2020, 15:47 Steve Jerman, <[email protected]>
    >> wrote:
    >> 
    >>> Hi,
    >>> 
    >>> I've just been testing restarts ... I restarted one of the
    >> instances (id
    >>> 1) ... and it doesn't join the quorum ... same error.
    >>> 
    >>> Odd that the system started fine but can't handle a restart....
    >>> 
    >>> Steve
    >>> 
    >>>> On 2/19/20, 7:45 AM, "Steve Jerman" <[email protected]> wrote:
    >>> 
    >>>    Thank You Mate,
    >>> 
    >>>    That fixed it. Unfortunately I can't easily avoid using
    >> 0.0.0.0
    >>> 
    >>>    My configuration is built using Docker Storm and it doesn't
    >> let you
    >>> bind to a host name...
    >>> 
    >>>    Steve
    >>> 
    >>>    On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" <
    >> [email protected]>
    >>> wrote:
    >>> 
    >>>        Hi Steve!
    >>> 
    >>>        If you are using ZooKeeper newer than 3.5.0, then this
    >> might be
    >>> the issue
    >>>        we are just discussing / trying to fix in ZOOKEEPER-2164.
    >>>        Can you test your setup with a config where you don't
    >> use 0.0.0.0
    >>> in the
    >>>        server addresses?
    >>> 
    >>>        If you need to bind to the 0.0.0.0 locally, then please
    >> set the
    >>>        'quorumListenOnAllIPs' config property to true.
    >>> 
    >>>        like:
    >>> 
    >>>        # usually you don't really need this, unless if you
    >> actually need
    >>> to bind
    >>>        to multiple IPs
    >>>        quorumListenOnAllIPs=true
    >>> 
    >>>        # it is best if all the zoo.cfg files contain the same
    >> address
    >>> settings,
    >>>        and we don't use 0.0.0.0 here
    >>>        server.1=zookeeper1:2888:3888
    >>>        server.2=zookeeper2:2888:3888
    >>>        server.3=zookeeper3:2888:3888
    >>> 
    >>>        Kind regards,
    >>>        Mate
    >>> 
    >>>        On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman <
    >> [email protected]>
    >>> wrote:
    >>> 
    >>>> Hello folks,
    >>>> 
    >>>> Wonder if anyone can help me. Suspect it must be
    >> something
    >>> simple but I
    >>>> cant see it. Any suggestions about how to diagnose
    >> would be
    >>> gratefully
    >>>> received.
    >>>> 
    >>>> I have a three node ZK cluster, when it starts up only
    >> two of
    >>> the nodes
    >>>> form a quorum. If I restart the leader the quorum
    >> reforms with
    >>> the other
    >>>> two node…
    >>>> 
    >>>> Thanks in advance for any  help
    >>>> Steve
    >>>> 
    >>>> This is the ‘stat’ for the leader/follow…
    >>>> 
    >>>> bash-5.0$ echo stat | nc zookeeper1 2181
    >>>> This ZooKeeper instance is not currently serving
    >> requests
    >>>> 
    >>>> bash-5.0$ echo stat | nc zookeeper2 2181
    >>>> Zookeeper version:
    >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
    >>>> on 02/10/2020 11:30 GMT
    >>>> Clients:
    >>>> /10.0.1.152:44910[1](queued=0,recved=151,sent=151)
    >>>> /10.0.1.140:53138[1](queued=0,recved=187,sent=187)
    >>>> /10.0.1.143:57422[1](queued=0,recved=151,sent=151)
    >>>> /10.0.1.152:59242[0](queued=0,recved=1,sent=0)
    >>>> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139)
    >>>> /10.0.1.152:49188[1](queued=0,recved=200,sent=203)
    >>>> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159)
    >>>> /10.0.1.140:36624[1](queued=0,recved=151,sent=151)
    >>>> 
    >>>> Latency min/avg/max: 0/0/5
    >>>> Received: 3338
    >>>> Sent: 3342
    >>>> Connections: 8
    >>>> Outstanding: 0
    >>>> Zxid: 0xc000000f3
    >>>> Mode: follower
    >>>> Node count: 181
    >>>> 
    >>>> bash-5.0$ echo stat | nc zookeeper3 2181
    >>>> Zookeeper version:
    >>> 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
    >>>> on 02/10/2020 11:30 GMT
    >>>> Clients:
    >>>> /10.0.1.152:49428[0](queued=0,recved=1,sent=0)
    >>>> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429)
    >>>> 
    >>>> Latency min/avg/max: 0/0/4
    >>>> Received: 1684
    >>>> Sent: 1686
    >>>> Connections: 2
    >>>> Outstanding: 0
    >>>> Zxid: 0xc000000f4
    >>>> Mode: leader
    >>>> Node count: 181
    >>>> Proposal sizes last/min/max: 32/32/406
    >>>> bash-5.0$
    >>>> 
    >>>> The trace for the failing node is:
    >>>> 
    >>>> server.1=0.0.0.0:2888:3888
    >>>> server.2=zookeeper2:2888:3888
    >>>> server.3=zookeeper3:2888:3888
    >>>> ZooKeeper JMX enabled by default
    >>>> Using config: /opt/zookeeper/bin/../conf/zoo.cfg
    >>>> 2020-02-19 04:23:27,759 [myid:] - INFO
    >>> [main:QuorumPeerConfig@135] -
    >>>> Reading configuration from:
    >> /opt/zookeeper/bin/../conf/zoo.cfg
    >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
    >>> [main:QuorumPeerConfig@387] -
    >>>> clientPortAddress is 0.0.0.0:2181
    >>>> 2020-02-19 04:23:27,764 [myid:] - INFO
    >>> [main:QuorumPeerConfig@391] -
    >>>> secureClientPort is not set
    >>>> 2020-02-19 04:23:27,771 [myid:1] - INFO
    >>> [main:DatadirCleanupManager@78]
    >>>> - autopurge.snapRetainCount set to 3
    >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
    >>> [main:DatadirCleanupManager@79]
    >>>> - autopurge.purgeInterval set to 24
    >>>> 2020-02-19 04:23:27,772 [myid:1] - INFO
    >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@138] -
    >> Purge task
    >>> started.
    >>>> 2020-02-19 04:23:27,773 [myid:1] - INFO
    >> [main:ManagedUtil@46]
    >>> - Log4j
    >>>> found with jmx enabled.
    >>>> 2020-02-19 04:23:27,774 [myid:1] - INFO
    >>> [PurgeTask:FileTxnSnapLog@115] -
    >>>> zookeeper.snapshot.trust.empty : false
    >>>> 2020-02-19 04:23:27,780 [myid:1] - INFO
    >>>> [PurgeTask:DatadirCleanupManager$PurgeTask@144] -
    >> Purge task
    >>> completed.
    >>>> 2020-02-19 04:23:27,781 [myid:1] - INFO
    >> [main:QuorumPeerMain@141]
    >>> -
    >>>> Starting quorum peer
    >>>> 2020-02-19 04:23:27,786 [myid:1] - INFO
    >>> [main:ServerCnxnFactory@135] -
    >>>> Using org.apache.zookeeper.server.NIOServerCnxnFactory
    >> as server
    >>> connection
    >>>> factory
    >>>> 2020-02-19 04:23:27,788 [myid:1] - INFO
    >>> [main:NIOServerCnxnFactory@673]
    >>>> - Configuring NIO connection handler with 10s
    >> sessionless
    >>> connection
    >>>> timeout, 2 selector thread(s), 32 worker threads, and
    >> 64 kB
    >>> direct buffers.
    >>>> 2020-02-19 04:23:27,791 [myid:1] - INFO
    >>> [main:NIOServerCnxnFactory@686]
    >>>> - binding to port 0.0.0.0/0.0.0.0:2181
    >>>> 2020-02-19 <http://0.0.0.0/0.0.0.0:21812020-02-19>
    >> 04:23:27,809
    >>> [myid:1]
    >>>> - INFO  [main:Log@169] - Logging initialized @249ms to
    >>>> org.eclipse.jetty.util.log.Slf4jLog
    >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
    >>> [main:ContextHandler@1520] -
    >>>> o.e.j.s.ServletContextHandler@5abca1e0
    >> {/,null,UNAVAILABLE}
    >>> contextPath
    >>>> ends with /*
    >>>> 2020-02-19 04:23:27,913 [myid:1] - WARN
    >>> [main:ContextHandler@1531] -
    >>>> Empty contextPath
    >>>> 2020-02-19 04:23:27,922 [myid:1] - INFO
    >> [main:X509Util@79] -
    >>> Setting -D
    >>>> jdk.tls.rejectClientInitiatedRenegotiation=true to
    >> disable
    >>> client-initiated
    >>>> TLS renegotiation
    >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
    >> [main:FileTxnSnapLog@115]
    >>> -
    >>>> zookeeper.snapshot.trust.empty : false
    >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
    >> [main:QuorumPeer@1470]
    >>> - Local
    >>>> sessions disabled
    >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
    >> [main:QuorumPeer@1481]
    >>> - Local
    >>>> session upgrading disabled
    >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
    >> [main:QuorumPeer@1448]
    >>> -
    >>>> tickTime set to 2000
    >>>> 2020-02-19 04:23:27,923 [myid:1] - INFO
    >> [main:QuorumPeer@1492]
    >>> -
    >>>> minSessionTimeout set to 4000
    >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
    >> [main:QuorumPeer@1503]
    >>> -
    >>>> maxSessionTimeout set to 40000
    >>>> 2020-02-19 04:23:27,924 [myid:1] - INFO
    >> [main:QuorumPeer@1518]
    >>> -
    >>>> initLimit set to 30
    >>>> 2020-02-19 04:23:27,932 [myid:1] - INFO
    >> [main:ZKDatabase@117] -
    >>>> zookeeper.snapshotSizeFactor = 0.33
    >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
    >> [main:QuorumPeer@1763]
    >>> - Using
    >>>> insecure (non-TLS) quorum communication
    >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
    >> [main:QuorumPeer@1769]
    >>> - Port
    >>>> unification disabled
    >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
    >> [main:QuorumPeer@2136]
    >>> -
    >>>> QuorumPeer communication is not secured! (SASL auth
    >> disabled)
    >>>> 2020-02-19 04:23:27,933 [myid:1] - INFO
    >> [main:QuorumPeer@2165]
    >>> -
    >>>> quorum.cnxn.threads.size set to 20
    >>>> 2020-02-19 04:23:27,934 [myid:1] - INFO
    >> [main:FileSnap@83] -
    >>> Reading
    >>>> snapshot
    >> /opt/zookeeper/data/version-2/snapshot.90000043e
    >>>> 2020-02-19 04:23:27,963 [myid:1] - INFO
    >> [main:Server@359] -
    >>>> jetty-9.4.24.v20191120; built:
    >> 2019-11-20T21:37:49.771Z; git:
    >>>> 363d5f2df3a8a28de40604320230664b9c793c16; jvm
    >>>> 1.8.0_111-internal-alpine-r0-b14
    >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
    >>> [main:DefaultSessionIdManager@333]
    >>>> - DefaultSessionIdManager workerName=node0
    >>>> 2020-02-19 04:23:27,989 [myid:1] - INFO
    >>> [main:DefaultSessionIdManager@338]
    >>>> - No SessionScavenger set, using defaults
    >>>> 2020-02-19 04:23:27,990 [myid:1] - INFO
    >> [main:HouseKeeper@140]
    >>> - node0
    >>>> Scavenging every 660000ms
    >>>> 2020-02-19 04:23:27,997 [myid:1] - INFO
    >> [main:ContextHandler@825]
    >>> -
    >>>> Started o.e.j.s.ServletContextHandler@5abca1e0
    >> {/,null,AVAILABLE}
    >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
    >>> [main:AbstractConnector@330] -
    >>>> Started ServerConnector@2b98378d{HTTP/1.1,[http/1.1]}{
    >>> 0.0.0.0:8080}
    >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
    >> [main:Server@399] -
    >>> Started
    >>>> @444ms
    >>>> 2020-02-19 04:23:28,004 [myid:1] - INFO
    >>> [main:JettyAdminServer@112] -
    >>>> Started AdminServer on address 0.0.0.0, port 8080 and
    >> command
    >>> URL /commands
    >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
    >>>> [main:QuorumCnxManager$Listener@867] - Election port
    >> bind
    >>> maximum retries
    >>>> is 1000
    >>>> 2020-02-19 04:23:28,007 [myid:1] - INFO
    >>>> [QuorumPeerListener:QuorumCnxManager$Listener@917] -
    >> My
    >>> election bind
    >>>> port: /0.0.0.0:3888
    >>>> 2020-02-19 04:23:28,014 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>> )(secure=disabled):QuorumPeer@1175]
    >>>> - LOOKING
    >>>> 2020-02-19 04:23:28,015 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):FastLeaderElection@885] - New
    >> election. My
    >>> id =  1,
    >>>> proposed zxid=0xa000000fb
    >>>> 2020-02-19 04:23:28,018 [myid:1] - INFO
    >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
    >> smaller
    >>> server
    >>>> identifier, so dropping the connection: (2, 1)
    >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
    >>>> [WorkerSender[myid=1]:QuorumCnxManager@438] - Have
    >> smaller
    >>> server
    >>>> identifier, so dropping the connection: (3, 1)
    >>>> 2020-02-19 04:23:28,019 [myid:1] - INFO
    >>>> [WorkerReceiver[myid=1]:FastLeaderElection@679] -
    >> Notification:
    >>> 2
    >>>> (message format version), 1 (n.leader), 0xa000000fb
    >> (n.zxid), 0x1
    >>>> (n.round), LOOKING (n.state), 1 (n.sid), 0xa
    >> (n.peerEPoch),
    >>> LOOKING (my
    >>>> state)0 (n.config version)
    >>>> 2020-02-19 04:23:28,221 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):QuorumCnxManager@438] - Have
    >> smaller server
    >>>> identifier, so dropping the connection: (2, 1)
    >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):QuorumCnxManager@438] - Have
    >> smaller server
    >>>> identifier, so dropping the connection: (3, 1)
    >>>> 2020-02-19 04:23:28,222 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):FastLeaderElection@919] -
    >> Notification time
    >>> out: 400
    >>>> 2020-02-19 04:23:28,623 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):QuorumCnxManager@438] - Have
    >> smaller server
    >>>> identifier, so dropping the connection: (2, 1)
    >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):QuorumCnxManager@438] - Have
    >> smaller server
    >>>> identifier, so dropping the connection: (3, 1)
    >>>> 2020-02-19 04:23:28,624 [myid:1] - INFO
    >>>> [QuorumPeer[myid=1](plain=0.0.0.0:2181
    >>>> )(secure=disabled):FastLeaderElection@919] -
    >> Notification time
    >>> out: 800
    >>>> …
    >>>> 
    >>>> And for the leader:
    >>>> ….
    >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
    >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
    >> Notification:
    >>> 2
    >>>> (message format version), 3 (n.leader), 0xb0000018c
    >> (n.zxid), 0x1
    >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
    >> (n.peerEPoch),
    >>> LEADING (my
    >>>> state)0 (n.config version)
    >>>> 2020-02-19 05:02:10,341 [myid:3] - INFO
    >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
    >> Notification:
    >>> 2
    >>>> (message format version), 3 (n.leader), 0xb0000018c
    >> (n.zxid), 0x1
    >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
    >> (n.peerEPoch),
    >>> LEADING (my
    >>>> state)0 (n.config version)
    >>>> 2020-02-19 05:02:33,640 [myid:3] - WARN
    >>>> [NIOWorkerThread-4:NIOServerCnxn@366] - Unable to read
    >>> additional data
    >>>> from client sessionid 0x30002ba40710018, likely client
    >> has
    >>> closed socket
    >>>> 2020-02-19 05:02:39,047 [myid:3] - INFO
    >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
    >> session
    >>>> 0x20002ba2fd6001a, timeout of 40000ms exceeded
    >>>> 2020-02-19 05:02:39,048 [myid:3] - INFO
    >>>> [SessionTracker:QuorumZooKeeperServer@157] -
    >> Submitting global
    >>>> closeSession request for session 0x20002ba2fd6001a
    >>>> 2020-02-19 05:03:10,340 [myid:3] - INFO  [/
    >> 0.0.0.0:3888
    >>>> :QuorumCnxManager$Listener@924] - Received connection
    >> request
    >>>> 10.0.1.152:52492
    >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
    >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1143] -
    >> Interrupted
    >>> while
    >>>> waiting for message on queue
    >>>> java.lang.InterruptedException
    >>>>                at
    >>>> 
    >>> 
    >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    >>>>                at
    >>>> 
    >>> 
    >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
    >>>>                at
    >>>> 
    >>> 
    >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
    >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
    >>>> [SendWorker:1:QuorumCnxManager$SendWorker@1153] -
    >> Send worker
    >>> leaving
    >>>> thread  id 1 my id = 3
    >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
    >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1227] -
    >> Connection
    >>> broken for
    >>>> id 1, my id = 3, error =
    >>>> java.net.SocketException: Socket closed
    >>>>                at
    >> java.net.SocketInputStream.socketRead0(Native
    >>> Method)
    >>>>                at
    >>>> 
    >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    >>>>                at
    >>>> 
    >> java.net.SocketInputStream.read(SocketInputStream.java:170)
    >>>>                at
    >>>> 
    >> java.net.SocketInputStream.read(SocketInputStream.java:141)
    >>>>                at
    >>>> 
    >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    >>>>                at
    >>>> 
    >> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    >>>>                at
    >>>> 
    >> java.io.DataInputStream.readInt(DataInputStream.java:387)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
    >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
    >>>> [RecvWorker:1:QuorumCnxManager$RecvWorker@1230] -
    >> Interrupting
    >>> SendWorker
    >>>> 2020-02-19 05:03:10,340 [myid:3] - WARN
    >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1227] -
    >> Connection
    >>> broken for
    >>>> id 3, my id = 3, error =
    >>>> java.io.EOFException
    >>>>                at
    >>>> 
    >> java.io.DataInputStream.readInt(DataInputStream.java:392)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1212)
    >>>> 2020-02-19 05:03:10,341 [myid:3] - WARN
    >>>> [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] -
    >> Interrupting
    >>> SendWorker
    >>>> 2020-02-19 05:03:10,341 [myid:3] - INFO  [/
    >> 0.0.0.0:3888
    >>>> :QuorumCnxManager$Listener@924] - Received connection
    >> request
    >>>> 10.0.1.142:46326
    >>>> 2020-02-19 05:03:10,343 [myid:3] - WARN
    >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1143] -
    >> Interrupted
    >>> while
    >>>> waiting for message on queue
    >>>> java.lang.InterruptedException
    >>>>                at
    >>>> 
    >>> 
    >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    >>>>                at
    >>>> 
    >>> 
    >> 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
    >>>>                at
    >>>> 
    >>> 
    >> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82)
    >>>>                at
    >>>> 
    >>> 
    >> 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131)
    >>>> 2020-02-19 05:03:10,344 [myid:3] - WARN
    >>>> [SendWorker:3:QuorumCnxManager$SendWorker@1153] -
    >> Send worker
    >>> leaving
    >>>> thread  id 3 my id = 3
    >>>> 2020-02-19 05:03:10,344 [myid:3] - INFO
    >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
    >> Notification:
    >>> 2
    >>>> (message format version), 3 (n.leader), 0xb0000018c
    >> (n.zxid), 0x1
    >>>> (n.round), LOOKING (n.state), 3 (n.sid), 0xb
    >> (n.peerEPoch),
    >>> LEADING (my
    >>>> state)0 (n.config version)
    >>>> 2020-02-19 05:03:10,345 [myid:3] - INFO
    >>>> [WorkerReceiver[myid=3]:FastLeaderElection@679] -
    >> Notification:
    >>> 2
    >>>> (message format version), 3 (n.leader), 0xb0000018c
    >> (n.zxid), 0x1
    >>>> (n.round), LEADING (n.state), 3 (n.sid), 0xc
    >> (n.peerEPoch),
    >>> LEADING (my
    >>>> state)0 (n.config version)
    >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
    >>>> [SessionTracker:ZooKeeperServer@398] - Expiring
    >> session
    >>>> 0x30002ba40710018, timeout of 40000ms exceeded
    >>>> 2020-02-19 05:03:11,048 [myid:3] - INFO
    >>>> [SessionTracker:QuorumZooKeeperServer@157] -
    >> Submitting global
    >>>> closeSession request for session 0x30002ba40710018
    >>>> 
    >>>> All of the instances have a similar zoo.cfg:
    >>>> 
    >>>> bash-4.3# cat conf/zoo.cfg
    >>>> # The number of milliseconds of each tick
    >>>> tickTime=2000
    >>>> # The number of ticks that the initial
    >>>> # synchronization phase can take
    >>>> initLimit=30
    >>>> # The number of ticks that can pass between
    >>>> # sending a request and getting an acknowledgement
    >>>> syncLimit=5
    >>>> # Purge every 24 hours
    >>>> autopurge.purgeInterval=24
    >>>> # the directory where the snapshot is stored.
    >>>> # do not use /tmp for storage, /tmp here is just
    >>>> # example sakes.
    >>>> dataDir=/opt/zookeeper/data
    >>>> # the port at which the clients will connect
    >>>> clientPort=2181
    >>>> #Append other confg...
    >>>> electionPortBindRetry=1000
    >>>> 4lw.commands.whitelist=stat, ruok, conf, mntr
    >>>> 
    >>>> server.1=zookeeper1:2888:3888
    >>>> server.2=zookeeper2:2888:3888
    >>>> server.3=0.0.0.0:2888:3888
    >>>> 
    >>> 
    >>> 
    >>> 
    >>> 
    >>> 
    >> 
    >> 
    >> 
    >> 
    >>

Re: Zookeeper won't form a quorum...

Reply via email to