The election connections don't have to be set for the cluster to operate. Am I missing anything?
-Flavio > On 24 Sep 2015, at 11:38, Raúl Gutiérrez Segalés <[email protected]> wrote: > > On 24 September 2015 at 06:36, Flavio Junqueira <[email protected]> wrote: > >> I can see that the client is disconnecting from the server, and there is >> also a new round of leader election for the zookeeper servers. If this is >> happening, then yeah, your ensemble is unstable. If the ensemble leader >> election is being triggered frequently, then I'd start by looking there. >> Try to determine why the ensemble is failing to continue with the same >> leader. If ensemble elections aren't happening frequently, then another >> possibility is that GC pauses are causing the session to expire. >> > > On the other hand, if it's a low traffic cluster you might need to enable > TCP keepalives to ensure election connections between the cluster members > don't go away (the ZAB connections on the other hand, iirc, have protocol > level pings so those are fine.. I think): > > https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > > > -rgs > > > >> -Flavio >> >>> On 24 Sep 2015, at 05:24, Akmal Abbasov <[email protected]> >> wrote: >>> >>> Hi, >>> I am using zookeeper 3.4.6 >>> I have a spark cluster configured with HA. Once per 1-2 days, the active >> spark master is shutting down with a message >>> 15/09/23 18:58:18 INFO zookeeper.ClientCnxn: Unable to read additional >> data from server sessionid 0x34ffa68dbd10021, likely server has closed >> socket, closing socket connection and attempting reconnect >>> 15/09/23 18:58:18 INFO state.ConnectionStateManager: State change: >> SUSPENDED >>> 15/09/23 18:58:18 INFO master.ZooKeeperLeaderElectionAgent: We have lost >> leadership >>> 15/09/23 18:58:18 ERROR master.Master: Leadership has been revoked -- >> master shutting down. >>> 15/09/23 18:58:18 INFO util.Utils: Shutdown hook called >>> >>> I don’t have the zookeeper logs from the same period, but the logs are >> full of the these messages >>> 2015-09-24 05:07:42,228 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /10.0.8.4:34705 >>> 2015-09-24 05:07:42,229 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old >> client /10.0.8.4:34705; will be dropped if server is in r-o mode >>> 2015-09-24 05:07:42,229 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to >> establish new session at /10.0.8.4:34705 >>> 2015-09-24 05:07:42,292 [myid:1] - INFO >> [CommitProcessor:1:ZooKeeperServer@617] - Established session >> 0x14ffd3670130030 with negotiated timeout 20001 for client /10.0.8.4:34705 >>> 2015-09-24 05:07:42,302 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >>> EndOfStreamException: Unable to read additional data from client >> sessionid 0x14ffd3670130030, likely client has closed socket >>> at >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>> at >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) >>> at java.lang.Thread.run(Thread.java:745) >>> 2015-09-24 05:07:42,303 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >> client /10.0.8.4:34705 which had sessionid 0x14ffd3670130030 >>> 2015-09-24 05:07:42,314 [myid:1] - ERROR >> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception: >>> java.nio.channels.CancelledKeyException >>> at >> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >>> at >> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >>> at >> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) >>> at >> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) >>> at >> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) >>> at >> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) >>> 2015-09-24 05:07:42,334 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /10.0.8.4:34707 >>> 2015-09-24 05:07:42,334 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old >> client /10.0.8.4:34707; will be dropped if server is in r-o mode >>> 2015-09-24 05:07:42,335 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to >> establish new session at /10.0.8.4:34707 >>> 2015-09-24 05:07:42,357 [myid:1] - INFO >> [CommitProcessor:1:ZooKeeperServer@617] - Established session >> 0x14ffd3670130031 with negotiated timeout 20001 for client /10.0.8.4:34707 >>> 2015-09-24 05:07:42,364 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception >>> EndOfStreamException: Unable to read additional data from client >> sessionid 0x14ffd3670130031, likely client has closed socket >>> at >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) >>> at >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) >>> at java.lang.Thread.run(Thread.java:745) >>> 2015-09-24 05:07:42,365 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >> client /10.0.8.4:34707 which had sessionid 0x14ffd3670130031 >>> 2015-09-24 05:07:42,376 [myid:1] - ERROR >> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception: >>> java.nio.channels.CancelledKeyException >>> at >> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) >>> at >> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) >>> at >> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) >>> at >> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) >>> at >> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) >>> at >> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) >>> >>> Also there are >>> 2015-09-24 06:29:54,459 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - >> Shutting down >>> 2015-09-24 06:29:54,459 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441] - shutting >> down >>> 2015-09-24 06:29:54,459 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] - >> Shutting down >>> 2015-09-24 06:29:54,459 [myid:1] - INFO >> [FollowerRequestProcessor:1:FollowerRequestProcessor@95] - >> FollowerRequestProcessor exited loop! >>> 2015-09-24 06:29:54,460 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting >> down >>> 2015-09-24 06:29:54,464 [myid:1] - INFO >> [CommitProcessor:1:CommitProcessor@150] - CommitProcessor exited loop! >>> 2015-09-24 06:29:54,465 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] - >> shutdown of request processor complete >>> 2015-09-24 06:29:54,466 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209] - >> Shutting down >>> 2015-09-24 06:29:54,466 [myid:1] - INFO >> [SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor exited! >>> 2015-09-24 06:29:54,466 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING >>> 2015-09-24 06:29:54,584 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /10.0.8.58:36137 >>> 2015-09-24 06:29:54,584 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of >> session 0x0 due to java.io.IOException: ZooKeeperServer not running >>> 2015-09-24 06:29:54,584 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >> client /10.0.8.58:36137 (no session established for client) >>> 2015-09-24 06:29:54,679 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /10.0.8.57:57410 >>> 2015-09-24 06:29:54,680 [myid:1] - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of >> session 0x0 due to java.io.IOException: ZooKeeperServer not running >>> 2015-09-24 06:29:54,680 [myid:1] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for >> client /10.0.8.57:57410 (no session established for client) >>> >>> I also observed that hadoop-zkfc restarts very frequently. >>> Any ideas what could be wrong? >>> >>> Thanks. >> >>
