Oh, and I was wondering if 'leaderVoteWait' might help in Solr4.

On 27 February 2015 at 18:04, Damien Kamerman <dami...@gmail.com> wrote:

> This is going to push SolrCloud beyond its limits.  Is this just an
>> exercise to see how far you can push Solr, or are you looking at setting
>> up a production install with several thousand collections?
>>
>>
> I'm looking towards production.
>
>
>> In Solr 4.x, the clusterstate is one giant JSON structure containing the
>> state of the entire cloud.  With 5000 collections, the entire thing
>> would need to be downloaded and uploaded at least 5000 times during the
>> course of a successful full system startup ... and I think with
>> replicationFactor set to 2, that might actually be 10000 times. The
>> best-case scenario is that it would take a VERY long time, the
>> worst-case scenario is that concurrency problems would lead to a
>> deadlock.  A deadlock might be what is happening here.
>>
>>
> Yes, clusterstate.json is 3.3M. At times on startup I think it does
> deadlock; log shows after 1min:
> org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes
> published as DOWN in our cluster state.
>
>
>> In Solr 5.x, the clusterstate is broken up so there's a separate state
>> structure for each collection.  This setup allows for faster and safer
>> multi-threading and far less data transfer.  Assuming I understand the
>> implications correctly, there might not be any need to increase
>> jute.maxbuffer with 5.x ... although I have to assume that I might be
>> wrong about that.
>>
>> I would very much recommend that you set your scenario up from scratch
>> in Solr 5.0.0, to see if the new clusterstate format can eliminate the
>> problem you're seeing.  If it doesn't, then we can pursue it as a likely
>> bug in the 5.x branch and you can file an issue in Jira.
>>
>>
> Thanks, will test in Solr 5.0.0.
>



-- 
Damien Kamerman

Reply via email to