On 2/26/2015 11:14 PM, Damien Kamerman wrote: > I've run into an issue with starting my solr cloud with many collections. > My setup is: > 3 nodes (solr 4.10.3 ; 64GB RAM each ; jdk1.8.0_25) running on a single > server (256GB RAM). > 5,000 collections (1 x shard ; 2 x replica) = 10,000 cores > 1 x Zookeeper 3.4.6 > Java arg -Djute.maxbuffer=67108864 added to solr and ZK. > > Then I stop all nodes, then start all nodes. All replicas are in the down > state, some have no leader. At times I have seen some (12 or so) leaders in > the active state. In the solr logs I see lots of: > > org.apache.solr.cloud.ZkController; Still seeing conflicting information > about the leader of shard shard1 for collection DDDDDD-4351 after 30 > seconds; our state says http://ftea1:8001/solr/DDDDDD-4351_shard1_replica1/, > but ZooKeeper says http://ftea1:8000/solr/DDDDDD-4351_shard1_replica2/
<snip> > I've tried staggering the starts (1min) but does not help. > I've reproduced with zero documents. > Restarts are OK up to around 3,000 cores. > Should this work? This is going to push SolrCloud beyond its limits. Is this just an exercise to see how far you can push Solr, or are you looking at setting up a production install with several thousand collections? In Solr 4.x, the clusterstate is one giant JSON structure containing the state of the entire cloud. With 5000 collections, the entire thing would need to be downloaded and uploaded at least 5000 times during the course of a successful full system startup ... and I think with replicationFactor set to 2, that might actually be 10000 times. The best-case scenario is that it would take a VERY long time, the worst-case scenario is that concurrency problems would lead to a deadlock. A deadlock might be what is happening here. In Solr 5.x, the clusterstate is broken up so there's a separate state structure for each collection. This setup allows for faster and safer multi-threading and far less data transfer. Assuming I understand the implications correctly, there might not be any need to increase jute.maxbuffer with 5.x ... although I have to assume that I might be wrong about that. I would very much recommend that you set your scenario up from scratch in Solr 5.0.0, to see if the new clusterstate format can eliminate the problem you're seeing. If it doesn't, then we can pursue it as a likely bug in the 5.x branch and you can file an issue in Jira. Thanks, Shawn