I am not this is directly related but we also sometimes see clients losing connections on 6.5.1, this with the problem described below are unique to 6.5.1, i have not seen this many issues with cloud in a short time for a very long time.
2017-05-09 21:30:36.661 ERROR (Document compiler) [c:logs s:shard1 r:core_node1 x:logs_shard1_replica1] o.a.s.c.s.i.CloudSolrClient Request to collection search failed due to (0) java.lang.IllegalStateException: Connection pool shut down, retry? 0 Clients appear unable to recover from this problem. The cloud the clients are connecting to is up and doing fine. Any ideas? Thanks, Markus -----Original message----- > From:Markus Jelsma <markus.jel...@openindex.io> > Sent: Monday 8th May 2017 11:35 > To: solr-user <solr-user@lucene.apache.org> > Subject: 6.5.1. cloud went partially down > > Hi, > > Multiple 6.5.1. clouds / collections went down this weekend around the same > time, they share the same ZK quorum. The nodes stayed up but did not rejoin > the cluster (find or connect to ZK) > > This is what the log told us: > > 2017-05-06 18:58:34.893 WARN > (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.ConnectionManager Watcher > org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: ZooKe > eperConnection > Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search > got event WatchedEvent state:Disconnected type:None path:null path: null > type: None > 2017-05-06 18:58:34.893 WARN > (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.ConnectionManager zkClient has disconnected > 2017-05-06 18:58:35.001 WARN > (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr > x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) > [c:search s:shard2 r:core_node6 x:search_shard2_replica3] > o.a.s.c.c.ConnectionManager Watcher > org.apache.solr.common.cloud.ConnectionManager@c226cc name: > ZooKeeperConnection > Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search > got event WatchedEvent state:Disconnected type:None path:null path: null > type: None > 2017-05-06 18:58:35.010 WARN > (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr > x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) > [c:search s:shard2 r:core_node6 x:search_shard2_replica3] > o.a.s.c.c.ConnectionManager zkClient has disconnected > 2017-05-06 18:58:45.360 WARN > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.ConnectionManager Watcher > org.apache.solr.common.cloud.ConnectionManager@4f97bdad name: > ZooKeeperConnection > Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search > got event WatchedEvent state:Expired type:None path:null path: null type: > None > 2017-05-06 18:58:45.360 WARN > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. > Attempting to reconnect to recover relationship with ZooKeeper... > 2017-05-06 18:58:45.380 WARN > (OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_0000000558) > [ ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue > loop > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired for /overseer/queue > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339) > at > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336) > at > org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308) > at > org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285) > at > org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393) > at > org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159) > at > org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180) > at java.lang.Thread.run(Thread.java:745) > 2017-05-06 18:58:45.381 WARN > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one... > 2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [ ] o.a.s.c.Overseer > could not read the data > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired for /overseer_elect/leader > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) > at > org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287) > at java.lang.Thread.run(Thread.java:745) > 2017-05-06 18:58:46.453 WARN > (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr > x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) > [c:search s:shard2 r:core_node6 x:search_shard2_replica3] > o.a.s.c.c.ConnectionManager Watcher > org.apache.solr.common.cloud.ConnectionManager@c226cc name: > ZooKeeperConnection > Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search > got event WatchedEvent state:Expired type:None path:null path: null type: > None > 2017-05-06 18:58:46.453 WARN > (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr > x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) > [c:search s:shard2 r:core_node6 x:search_shard2_replica3] > o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. > Attempting to reconnect to recover relationship with ZooKeeper... > 2017-05-06 18:58:46.460 WARN > (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr > x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) > [c:search s:shard2 r:core_node6 x:search_shard2_replica3] > o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one... > 2017-05-06 18:58:53.599 ERROR > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.ZkController > :org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /live_nodes/idx6.example.org:8983_solr > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:119) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430) > at > org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823) > at > org.apache.solr.cloud.ZkController.access$600(ZkController.java:120) > at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142) > at > org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > 2017-05-06 18:58:53.599 ERROR > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper > failed:org.apache.solr.common.cloud.ZooKeeperException: > at org.apache.solr.cloud.ZkController$1.command(ZkController.java:392) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142) > at > org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /live_nodes/idx6.example.org:8983_solr > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:119) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430) > at > org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823) > at > org.apache.solr.cloud.ZkController.access$600(ZkController.java:120) > at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340) > ... 10 more > 2017-05-06 18:58:53.600 WARN > (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr) [ ] > o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed > 2017-05-06 18:58:57.052 ERROR (qtp1873653341-14807) [ ] > o.a.s.h.RequestHandlerBase > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode > = Session expired for /collections/search/state.json > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) > at > org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110) > at > org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:321) > at > org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:102) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) > at > org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) > at > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748) > at > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510) > > After that we occasionally see: > > 2017-05-06 18:58:59.079 ERROR (qtp1873653341-14989) [ ] > o.a.s.s.HttpSolrCall > null:org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired for /collections/search/state.json > > We executed a hard Solr restart to get stuff back up. Is this a known issue? > > Thanks, > Markus >