SolrCloud: CloudSolrServer Zookeeper disconnects and re-connects with heavy memory usage consumption.

Luis Cappa Banda Wed, 12 Dec 2012 09:52:53 -0800

Hello everyone.

I have developed and stand alone WebApp with a custom API that dispatches
queries to SolrCloud using CloudSolrServer implementation to do that. I´m
testing with a single Zookeeper instance installed in an Amazon instance.
Solr servers are deployed in two Amazon instances and I have one intance
more which contains the custom search API engine that I told before. I´m
using a *30000ms *of Zookeeper *zkConnectdTimeout *and *zkClientTimeout*.



With that scenario I´ve noticed that everything works fine with
CloudSolrServer but frequently I see loggin traces as the following:


*2012-12-12 17:35:41,932 30486688
[http-bio-8080-exec-7-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67044ms for sessionid 0x13b8a4218720055, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:41,996 30486752
[http-bio-8080-exec-8-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67301ms for sessionid 0x13b8a4218720052, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:42,077 30522458
[pool-1-thread-1-SendThread(amazon-dns:9000)] INFO
 org.apache.zookeeper.ClientCnxn  - Client session timed out, have not
heard from server in 67299ms for sessionid 0x13b8a4218720053, closing
socket connection and attempting reconnect*
*
*
*2012-12-12 17:35:42,286 30487042 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Watcher
org.apache.solr.common.cloud.ConnectionManager@20c5f562name:ZooKeeperConnection
Watcher:amazon-dns:9000 got event WatchedEvent
state:Disconnected type:None path:null path:null type:None*



The message is clear: nothing have been heard from the server in 67seconds.
It´s strange, because Zookeeper Amazon instance is up and Zookeeper the
service is up. Also a connection problem would be extremely strange because
communication between Amazon instances is asumed to be always on.

After that, I start seeing logging traces as:


*2012-12-12 17:37:15,501 30580257 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ZkStateReader  - Updating cluster state from
ZooKeeper...*
*
*
*2012-12-12 17:37:15,510 30615891 [pool-1-thread-2-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Waiting for client to
connect to ZooKeeper*
*
*
*2012-12-12 17:37:15,512 30580268 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.DefaultConnectionStrategy  - Reconnected to
ZooKeeper*
*
*
*2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager  - Connected:true*
*
*
*2012-12-12 17:37:15,541 30580297 [http-bio-8080-exec-7-EventThread] INFO
 org.apache.zookeeper.ClientCnxn  - EventThread shut down*



But the *big problem* is that when this kind of
disconnnect-reconnect-disconnect-reconnect behavior happens the WebApp
seems to be blocked (it looks like CloudSolrServer Zookeeper status update
is blocking) and I continue receiven search queries. The result is that
memory increases and increases and the search engine Web App module gets
almost blocked. It seems that this kind of Zookeeper status update is
heavy-memory-consumer and also blocking.

Does anyone experienced a behavior like that? Any tips or suggestions?


Thank you very much in advance for your help.

Regards,

-- 

- Luis Cappa

SolrCloud: CloudSolrServer Zookeeper disconnects and re-connects with heavy memory usage consumption.

Reply via email to