I would have to negotiate with the middle-ware teams - but, we've used a core per customer in master-slave mode for about 3 years now, with great success. Our pool of data is very large, so limiting a customer's searches to just their core keeps query times fast (or at least reduces the chances of one customer impacting another with expensive queries. There is also a little security added - since the customer is required to provide the core to search, there is less chance that they'll see another customer's data in their responses (like they might if they 'forgot' to add a filter to their query. We were hoping that moving to Cloud would help our management of the largest customers - some of which we'd like to sub-shard with the cloud tooling. We expected cloud to support as many cores/collections as our 2-versions-old Solr instances - but we didn't count on all the increased network traffic or the extra complications of bringing up a large cloud cluster.
Monica Skidmore On 1/22/19, 10:06 PM, "Dave" <hastings.recurs...@gmail.com> wrote: Do you mind if I ask why so many collections rather than a field in one collection that you can apply a filter query to each customer to restrict the result set, assuming you’re the one controlling the middle ware? > On Jan 22, 2019, at 4:43 PM, Monica Skidmore <monica.skidm...@careerbuilder.com> wrote: > > We have been running Solr 5.4 in master-slave mode with ~4500 cores for a couple of years very successfully. The cores represent individual customer data, so they can vary greatly in size, and some of them have gotten too large to be manageable. > > We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2 NRTreplicas total per collection. We have experimented with additional servers and ZK nodes as a part of this move. We can create up to ~4000 collections, with a slow-down to ~20s per collection to create, but if we go much beyond that, the time to create collections shoots up, some collections fail to be created, and we see some of the nodes crash. Autoscaling brings nodes back into the cluster, but they don’t have all the replicas created on them that they should – we’re pretty sure this is related to the challenge of adding the large number of collections on those node as they come up. > > There are some approaches we could take that don’t separate our customers into collections, but we get some benefits from this approach that we’d like to keep. We’d also like to add the benefits of cloud, like balancing where collections are placed and the ability to split large collections. > > Is anyone successfully running Solr 7x in cloud mode with thousands or more of collections? Are there some configurations we should be taking a closer look at to make this feasible? Should we try a different replica type? (We do want NRT-like query latency, but we also index heavily – this cluster will have 10’s of millions of documents.) > > I should note that the problems are not due to the number of documents – the problems occur on a new cluster while we’re creating the collections we know we’ll need. > > Monica Skidmore > > >