Re: Maximum number of SolrCloud collections in limited hardware resource

Emir Arnautović Fri, 29 Jun 2018 01:30:02 -0700

Hi,
It is probably the best if you merge some of your collections (or all) and have 
discriminator field that will be used to filter out tenant’s documents only. In 
case you go with multiple collections serving multiple tenants, you would have 
to have logic on top of it to resolve tenant to collection. Unfortunately, Solr 
does not have alias with filtering like ES that would come handy in such cases.
If you stick with multiple collections, you can turn off caches completely, 
monitor latency and turn on caches for collections when it is reaching some 
threshold.
Caches are invalidated on commit, so submitting dummy doc and committing should 
invalidate caches. Alternative is to reload collection.


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Jun 2018, at 14:46, Shawn Heisey <elyog...@elyograg.org> wrote:
> 
> On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
>> Now the problem is, when we create about 1400 collection(all of them are
>> empty i.e. no document is added yet) the solr service goes down showing out
>> of memory exception. We have few questions here-
>> 
>> 1. When we are creating collections, each collection is taking about 8 MB
>> to 12 MB of memory when there is no document yet. Is there any way to
>> configure SolrCloud in a way that it takes low memory for each collection
>> initially(like 1MB for each collection), then we would be able to create
>> 1500 collection using about 3GB of machines RAM?
> 
> Solr doesn't dictate how much memory it allocates for a collection.  It 
> allocates what it needs, and if the heap size is too small for that, then you 
> get OOME.
> 
> You're going to need a lot more than two Solr servers to handle that many 
> collections, and they're going to need more than 12GB of memory.  You should 
> already have at least three servers in your setup, because ZooKeeper requires 
> three servers for redundancy.
> 
> http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup
> 
> Handling a large number of collections is one area where SolrCloud needs 
> improvement.  Work is constantly happening towards this goal, but it's a very 
> complex piece of software, so making design changes is not trivial.
> 
>> 2. Is there any way to clear/flush the cache of SolrCloud, specially from
>> those collections which we don't access for while(May be we can take those
>> inactive collections out of memory and load them back when they are needed
>> again)?
> 
> Unfortunately the functionality that allows index cores to be unloaded (which 
> we have colloquially called "LotsOfCores") does not work when Solr is running 
> in SolrCloud mode.SolrCloud functionality would break if its cores get 
> unloaded.  It would take a fair amount of development effort to allow the two 
> features to work together.
> 
>> 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
>> created by deleting documents and collections) ?
> 
> Java handles garbage collection automatically.  It's possible to explicitly 
> ask the system to collect garbage, but any good programming guide for Java 
> will recommend that programmers should NOT explicitly trigger GC.  While it 
> might be possible for Solr's memory usage to become more efficient through 
> development effort, it's already pretty good.  To our knowledge, Solr does 
> not currently have any memory leak bugs, and if any are found, they are taken 
> seriously and fixed as fast as we can fix them.
> 
>> Our target is without increasing the hardware resources, create maximum
>> number of collections, and keeping the highly accessed collections &
>> documents in memory. We'll appreciate your help.
> 
> That goal will require a fair amount of hardware.  You may have no choice but 
> to increase your hardware resources.
> 
> Thanks,
> Shawn
>

Re: Maximum number of SolrCloud collections in limited hardware resource

Reply via email to