On 10/10/2017 9:11 AM, Erick Erickson wrote:
Hmmm, that page is quite a bit out of date. I think Shawn is talking
about the "old style" Solr (4.x) that put all the state information
for all the collections in a single znode "clusterstate.json". Newer
style Solr puts each collection's state in
/collections/my_collection/state.json which has very significantly
reduced this issue.

There are still some issues in the 5x code line where you can have a
ton of messages be processed by the "Overseer" at massive scales...

However, I know of installations with several 100s of K (yes hundreds
of thousands) of replicas out there, split up amongst a _lot_ of
collections. That takes quite a bit of care and feeding, mind you.

So your setup shouldn't be a problem, although I'd bring up my Solr
instances one at a time.

Whether ZK is embedded or not isn't really a problem, but I would very
seriously consider moving it to an external ensemble. It's not so much
a functional issue as administrative. You have to be careful to bring
your Solr nodes up and down carefully or you lose quorum.

The testing I did on SOLR-7191, which is where that statement came from, was mostly on 5.x with the per-collection clusterstate that was new at the time, and I still found that it would not scale well.

Some later poking around with 6.x (long after SOLR-7191 was resolved with no commits) indicates that current versions scale even worse than early 5.x did.  I believe the biggest source of the scalability problems is the fact that the overseer queue gets spammed with a very large number of operations that cannot be handled quickly.

One collection with 200 shards probably would not present much of a scalability problem where ZK is concerned, but because a query on that collection will consist of between 201 and 401 smaller queries, I would not expect the single-query performance to be very good.

Thanks,
Shawn

Reply via email to