On 10/10/2017 9:11 AM, Erick Erickson wrote:
Hmmm, that page is quite a bit out of date. I think Shawn is talking
about the "old style" Solr (4.x) that put all the state information
for all the collections in a single znode "clusterstate.json". Newer
style Solr puts each collection's state in
/collections/my_collection/state.json which has very significantly
reduced this issue.
There are still some issues in the 5x code line where you can have a
ton of messages be processed by the "Overseer" at massive scales...
However, I know of installations with several 100s of K (yes hundreds
of thousands) of replicas out there, split up amongst a _lot_ of
collections. That takes quite a bit of care and feeding, mind you.
So your setup shouldn't be a problem, although I'd bring up my Solr
instances one at a time.
Whether ZK is embedded or not isn't really a problem, but I would very
seriously consider moving it to an external ensemble. It's not so much
a functional issue as administrative. You have to be careful to bring
your Solr nodes up and down carefully or you lose quorum.
The testing I did on SOLR-7191, which is where that statement came from,
was mostly on 5.x with the per-collection clusterstate that was new at
the time, and I still found that it would not scale well.
Some later poking around with 6.x (long after SOLR-7191 was resolved
with no commits) indicates that current versions scale even worse than
early 5.x did. I believe the biggest source of the scalability problems
is the fact that the overseer queue gets spammed with a very large
number of operations that cannot be handled quickly.
One collection with 200 shards probably would not present much of a
scalability problem where ZK is concerned, but because a query on that
collection will consist of between 201 and 401 smaller queries, I would
not expect the single-query performance to be very good.
Thanks,
Shawn