Hi,
I am using zookeeper 3.4.6
I have a spark cluster configured with HA. Once per 1-2 days, the active spark
master is shutting down with a message
15/09/23 18:58:18 INFO zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x34ffa68dbd10021, likely server has closed socket,
closing socket connection and attempting reconnect
15/09/23 18:58:18 INFO state.ConnectionStateManager: State change: SUSPENDED
15/09/23 18:58:18 INFO master.ZooKeeperLeaderElectionAgent: We have lost
leadership
15/09/23 18:58:18 ERROR master.Master: Leadership has been revoked -- master
shutting down.
15/09/23 18:58:18 INFO util.Utils: Shutdown hook called
I don’t have the zookeeper logs from the same period, but the logs are full of
the these messages
2015-09-24 05:07:42,228 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.0.8.4:34705
2015-09-24 05:07:42,229 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection
request from old client /10.0.8.4:34705; will be dropped if server is in r-o
mode
2015-09-24 05:07:42,229 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
attempting to establish new session at /10.0.8.4:34705
2015-09-24 05:07:42,292 [myid:1] - INFO
[CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14ffd3670130030
with negotiated timeout 20001 for client /10.0.8.4:34705
2015-09-24 05:07:42,302 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x14ffd3670130030, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-09-24 05:07:42,303 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /10.0.8.4:34705 which had sessionid 0x14ffd3670130030
2015-09-24 05:07:42,314 [myid:1] - ERROR [CommitProcessor:1:NIOServerCnxn@178]
- Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2015-09-24 05:07:42,334 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.0.8.4:34707
2015-09-24 05:07:42,334 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection
request from old client /10.0.8.4:34707; will be dropped if server is in r-o
mode
2015-09-24 05:07:42,335 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
attempting to establish new session at /10.0.8.4:34707
2015-09-24 05:07:42,357 [myid:1] - INFO
[CommitProcessor:1:ZooKeeperServer@617] - Established session 0x14ffd3670130031
with negotiated timeout 20001 for client /10.0.8.4:34707
2015-09-24 05:07:42,364 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x14ffd3670130031, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-09-24 05:07:42,365 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /10.0.8.4:34707 which had sessionid 0x14ffd3670130031
2015-09-24 05:07:42,376 [myid:1] - ERROR [CommitProcessor:1:NIOServerCnxn@178]
- Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
Also there are
2015-09-24 06:29:54,459 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
Shutting down
2015-09-24 06:29:54,459 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441] - shutting down
2015-09-24 06:29:54,459 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] -
Shutting down
2015-09-24 06:29:54,459 [myid:1] - INFO
[FollowerRequestProcessor:1:FollowerRequestProcessor@95] -
FollowerRequestProcessor exited loop!
2015-09-24 06:29:54,460 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting down
2015-09-24 06:29:54,464 [myid:1] - INFO
[CommitProcessor:1:CommitProcessor@150] - CommitProcessor exited loop!
2015-09-24 06:29:54,465 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] - shutdown
of request processor complete
2015-09-24 06:29:54,466 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209] - Shutting
down
2015-09-24 06:29:54,466 [myid:1] - INFO
[SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor exited!
2015-09-24 06:29:54,466 [myid:1] - INFO
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2015-09-24 06:29:54,584 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.0.8.58:36137
2015-09-24 06:29:54,584 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2015-09-24 06:29:54,584 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /10.0.8.58:36137 (no session established for client)
2015-09-24 06:29:54,679 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.0.8.57:57410
2015-09-24 06:29:54,680 [myid:1] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2015-09-24 06:29:54,680 [myid:1] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /10.0.8.57:57410 (no session established for client)
I also observed that hadoop-zkfc restarts very frequently.
Any ideas what could be wrong?
Thanks.