Dear all,

we are currenty using Solr 4.3.1 in production (With SolrCloud).

We encounter quite the same problem described in this other old post:

http://lucene.472066.n3.nabble.com/SolrCloud-CloudSolrServer-Zookeeper-disconnects-and-re-connects-with-heavy-memory-usage-consumption-td4026421.html

Sometime some nodes are disconnected from Zookeeper and then they try to
reconnect. The process is quite long because we have a quite long warming
process. And because of this long warming process, just after the recovery
process, the node is disconnected again and so on... until OOM sometime.

We already increased the Zk timeout. But it is not enought.

We are thinking to migrate to Solr 4.6.1 at least (perhaps 4.7 will be up
before the end of the migration :) ).

I know that a lot of SolrCloud bugs are corrected since Solr 4.3.1.

But, could we be sure that this problem will be resolved ? Or can this
problem occur with the last Solr version ? (I know this is not an easy
question ;) )

It seems that this correction : 

Deadlock while trying to recover after a ZK session expiry :
https://issues.apache.org/jira/browse/SOLR-5615

is a good point in addressing our current problem.

But do you think it will be enought ?

One last thing, I don't know if it is already adressed by a correction, but,
if there is no updates between disconnection and the reconnection, the
recovery process should not do anything more than the reconnection, I mean:
no replication, no tLog replay and no warming process. Is it the case ?

Ludovic.



-----
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to