Hmmm, that page is quite a bit out of date. I think Shawn is talking about the "old style" Solr (4.x) that put all the state information for all the collections in a single znode "clusterstate.json". Newer style Solr puts each collection's state in /collections/my_collection/state.json which has very significantly reduced this issue.
There are still some issues in the 5x code line where you can have a ton of messages be processed by the "Overseer" at massive scales... However, I know of installations with several 100s of K (yes hundreds of thousands) of replicas out there, split up amongst a _lot_ of collections. That takes quite a bit of care and feeding, mind you. So your setup shouldn't be a problem, although I'd bring up my Solr instances one at a time. Whether ZK is embedded or not isn't really a problem, but I would very seriously consider moving it to an external ensemble. It's not so much a functional issue as administrative. You have to be careful to bring your Solr nodes up and down carefully or you lose quorum. Best, Erick On Tue, Oct 10, 2017 at 7:37 AM, Jon Drews <j...@jondrews.com> wrote: > In the Solr Wiki, Shawn Heisey writes the following: > > "Regardless of the number of nodes or available resources, SolrCloud begins > to have stability problems when the number of collections reaches the low > hundreds. With thousands of collections, any little problem or change to > the cluster can cause a stability death spiral that may not recover for > tens of minutes. Try to keep the number of collections as low as possible. > These problems are due to how SolrCloud updates cluster state in zookeeper > in response to cluster changes. Work is underway to try and improve this > situation." > https://wiki.apache.org/solr/SolrPerformanceProblems? > action=diff&rev1=45&rev2=46 > > I'd like to know if this would apply to a standalone Solr system (embedded > Zk) with one collection and the low hundreds of shards (e.g. 1 node, 1 > collection and 200 shards). > > If there's any JIRA tickets we should track regarding the work that's > underway to resolve this situation please provide them. Thanks! > > We are currently using 5.3.1