How much memory do the Solr instances have? Any more details on what happens when the Solr instances start to fail?
We are using multiple Solr clouds to keep the collection count low(er).

On 29.01.2019 06:53, Gus Heck wrote:
Does it all have to be in a single cloud?

On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey <apa...@elyograg.org wrote:

On 1/28/2019 8:12 PM, Monica Skidmore wrote:
I would have to negotiate with the middle-ware teams - but, we've used a
core per customer in master-slave mode for about 3 years now, with great
success.  Our pool of data is very large, so limiting a customer's searches
to just their core keeps query times fast (or at least reduces the chances
of one customer impacting another with expensive queries.  There is also a
little security added - since the customer is required to provide the core
to search, there is less chance that they'll see another customer's data in
their responses (like they might if they 'forgot' to add a filter to their
query.  We were hoping that moving to Cloud would help our management of
the largest customers - some of which we'd like to sub-shard with the cloud
tooling.  We expected cloud to support as many cores/collections as our
2-versions-old Solr instances - but we didn't count on all the increased
network traffic or the extra complications of bringing up a large cloud
cluster.

At this time, SolrCloud will not handle what you're trying to throw at
it.  Without Cloud, Solr can fairly easily handle thousands of indexes,
because there is no communication between nodes about cluster state.
The immensity of that communication (handled via ZooKeeper) is why
SolrCloud can't scale to thousands of shard replicas.

The solution to this problem will be twofold:  1) Reduce the number of
work items in the Overseer queue.  2) Make the Overseer do its job a lot
faster.  There have been small incremental improvements towards these
goals, but as you've noticed, we're definitely not there yet.

On the subject of a customer forgetting to add a filter ... your systems
should be handling that for them ... if the customer has direct access
to Solr, then all bets are off... they'll be able to do just about
anything they want.  It is possible to configure a proxy to limit what
somebody can get to, but it would be pretty complicated to come up with
a proxy configuration that fully locks things down.

Using shards is completely possible without SolrCloud.  But SolrCloud
certainly does make it a lot easier.

How many records in your largest customer indexes?  How big are those
indexes on disk?

Thanks,
Shawn


Reply via email to