Re: Question: Solr perform well with thousands of replicas?

Erick Erickson Mon, 02 Sep 2019 06:21:05 -0700

> why so many collection/replica: it's our customer needs, for example: each 
> database table mappings a collection.


I always cringe when I see statements like this. What this means is that your 
customer doesn’t understand search and needs guidance in the proper use of any 
search technology, Solr included.

Solr is _not_ an RDBMS. Simply mapping the DB tables onto collections will 
almost certainly result in a poor experience. Next the customer will want to 
ask Solr to do the same thing a DB does, i.e. run a join across 10 tables etc., 
which will be abysmal. Solr isn’t designed for that. Some brilliant RDBMS 
people have spent many years making DBs to what they do and do it well. 

That said, RDBMSs have poor search capabilities, they aren’t built to solve the 
search problem.

I suspect the time you spend making Solr load a thousand cores will be wasted. 
Once you do get them loaded, performance will be horrible. IMO you’d be far 
better off helping the customer define their problem so they properly model 
their search problem. This may mean that the result will be a hybrid where Solr 
is used for the free-text search and the RDBMS uses the results of the search 
to do something. Or vice versa.

FWIW
Erick

> On Sep 2, 2019, at 5:55 AM, Hongxu Ma <inte...@outlook.com> wrote:
> 
> Thanks @Jörn and @Erick
> I enlarged my JVM memory, so far it's stable (but used many memory).
> And I will check lower-level errors according to your suggestion if error 
> happens.
> 
> About my scenario:
> 
>  *   why so many collection/replica: it's our customer needs, for example: 
> each database table mappings a collection.
>  *   this env is just a test cluster: I want to verify the max collection 
> number solr can support stably.
> 
> 
> ________________________________
> From: Erick Erickson <erickerick...@gmail.com>
> Sent: Friday, August 30, 2019 20:05
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: Question: Solr perform well with thousands of replicas?
> 
> “no registered leader” is the effect of some problem usually, not the root 
> cause. In this case, for instance, you could be running out of file handles 
> and see other errors like “too many open files”. That’s just one example.
> 
> One common problem is that Solr needs a lot of file handles and the system 
> defaults are too low. We usually recommend you start with 65K file handles 
> (ulimit) and bump up the number of processes to 65K too.
> 
> So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 
> segments in the index in each replica. Each segment consists of multiple 
> files (I’m skipping “compound files” here as an advanced topic), so each 
> segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 
> file handles on your system.
> 
> Bottom line: look for other, lower-level errors in the log to try to 
> understand what limit you’re running into.
> 
> All that said, there’ll be a number of “gotchas” when running that many 
> replicas on a particular node, I second Jörn;’s question...
> 
> Best,
> Erick
> 
>> On Aug 30, 2019, at 3:18 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>> 
>> What is the reason for this number of replicas? Solr should work fine, but 
>> maybe it is worth to consolidate some collections to avoid also 
>> administrative overhead.
>> 
>>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <inte...@outlook.com>:
>>> 
>>> Hi
>>> I have a solr-cloud cluster, but it's unstable when collection number is 
>>> big: 1000 replica/core per solr node.
>>> 
>>> To solve this issue, I have read the performance guide:
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>> 
>>> I noted there is a sentence on solr-cloud section:
>>> "Recent Solr versions perform well with thousands of replicas."
>>> 
>>> I want to know does it mean a single solr node can handle thousands of 
>>> replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>> 
>>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>> 
>>> Thanks for you help.
>>> 
>

Re: Question: Solr perform well with thousands of replicas?

Reply via email to