Hello everyone. I have developed and stand alone WebApp with a custom API that dispatches queries to SolrCloud using CloudSolrServer implementation to do that. I´m testing with a single Zookeeper instance installed in an Amazon instance. Solr servers are deployed in two Amazon instances and I have one intance more which contains the custom search API engine that I told before. I´m using a *30000ms *of Zookeeper *zkConnectdTimeout *and *zkClientTimeout*.
With that scenario I´ve noticed that everything works fine with CloudSolrServer but frequently I see loggin traces as the following: *2012-12-12 17:35:41,932 30486688 [http-bio-8080-exec-7-SendThread(amazon-dns:9000)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 67044ms for sessionid 0x13b8a4218720055, closing socket connection and attempting reconnect* * * *2012-12-12 17:35:41,996 30486752 [http-bio-8080-exec-8-SendThread(amazon-dns:9000)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 67301ms for sessionid 0x13b8a4218720052, closing socket connection and attempting reconnect* * * *2012-12-12 17:35:42,077 30522458 [pool-1-thread-1-SendThread(amazon-dns:9000)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 67299ms for sessionid 0x13b8a4218720053, closing socket connection and attempting reconnect* * * *2012-12-12 17:35:42,286 30487042 [http-bio-8080-exec-7-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@20c5f562name:ZooKeeperConnection Watcher:amazon-dns:9000 got event WatchedEvent state:Disconnected type:None path:null path:null type:None* The message is clear: nothing have been heard from the server in 67seconds. It´s strange, because Zookeeper Amazon instance is up and Zookeeper the service is up. Also a connection problem would be extremely strange because communication between Amazon instances is asumed to be always on. After that, I start seeing logging traces as: *2012-12-12 17:37:15,501 30580257 [http-bio-8080-exec-7-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper...* * * *2012-12-12 17:37:15,510 30615891 [pool-1-thread-2-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper* * * *2012-12-12 17:37:15,512 30580268 [http-bio-8080-exec-7-EventThread] INFO org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to ZooKeeper* * * *2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Connected:true* * * *2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down* But the *big problem* is that when this kind of disconnnect-reconnect-disconnect-reconnect behavior happens the WebApp seems to be blocked (it looks like CloudSolrServer Zookeeper status update is blocking) and I continue receiven search queries. The result is that memory increases and increases and the search engine Web App module gets almost blocked. It seems that this kind of Zookeeper status update is heavy-memory-consumer and also blocking. Does anyone experienced a behavior like that? Any tips or suggestions? Thank you very much in advance for your help. Regards, -- - Luis Cappa