Dear all, Thank you for all your advices.
This comment says: "SolrCloud starts to have serious problems when you create a lot of collections. We are aware of the scalability issues, but they are not easy to fix." http://lucene.472066.n3.nabble.com/Fwd-Solr-Cloud-6-0-0-hangs-when-creating-large-amount-of-collections-and-node-fails-to-recover-aftert-tp4276364p4276404.html So I am doubt whether it will affect when our system goes beyond thousands of tenants.. One way I feel is adding a custom load balancing mechanism which will route tenants to different solr clusters. Any easy way of dealing with this situation? Best Regards, Chamil On Wed, Aug 31, 2016 at 1:42 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > HI Chamil, > > One thing to consider is relevancy, especially in case tenants' domains > are different (e.g. one is tech and other pharmacy). If you go with one > collection and use same field (e.g. desc) for all tenants, you will get one > field stats and could skew results ordering if you order by score (e.g. > word 'cream' might be infrequent in tech tenant but could become frequent > overall because of large pharmacy tenant). > > On the other side having large number of collection could also be > problematic. You can address that issue with splitting tenants to multiple > clusters, or having collections for large tenants and grouping smaller > tenants by domain. > > Make sure that you use routing by tenant id in case of multi tenant > collection. > > HTH, > Emir > > > > On 28.08.2016 07:02, Chamil Jeewantha wrote: > >> Thank you everyone for your great support. >> >> I will update you with our final approach. >> >> Best regards, >> Chamil >> >> On Aug 28, 2016 01:34, "John Bickerstaff" <j...@johnbickerstaff.com> >> wrote: >> >> In my own work, the risk to the business if every single client cannot >>> access search is so great, we would never consider putting everything in >>> one. You should certainly ask that question of the business stakeholders >>> before you decide. >>> >>> For that reason, I might recommend that each of the multiple collections >>> suggested above by Erick could also be on a separate SolrCloud (or single >>> Solr instance) so that no single failure can ever take down every >>> tenant's >>> ability to search -- only those on that particular SolrCloud... >>> >>> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson < >>> erickerick...@gmail.com> >>> wrote: >>> >>> There's no one right answer here. I've also seen a hybrid approach >>>> where there are multiple collections each of which has some >>>> number of tenants resident. Eventually, you need to think of some >>>> kind of partitioning, my rough number of documents for a single core >>>> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core). >>>> >>>> All that said, you may also be interested in the "transient cores" >>>> option, see: https://cwiki.apache.org/confluence/display/solr/ >>>> Defining+core.properties >>>> and the transient and transientCacheSize (this latter in solr.xml). Note >>>> that this is stand-alone only so you can't move that concept to >>>> SolrCloud if you eventually go there. >>>> >>>> Best, >>>> Erick >>>> >>>> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha <kdcha...@gmail.com> >>>> wrote: >>>> >>>>> Dear Solr Members, >>>>> >>>>> We are using SolrCloud as the search provider of a multi-tenant cloud >>>>> >>>> based >>>> >>>>> application. We have one schema for all the tenants. The indexes will >>>>> >>>> have >>>> >>>>> large number(millions) of documents. >>>>> >>>>> As of our research, we have two options, >>>>> >>>>> - One large collection for all the tenants and use Composite-ID >>>>> >>>> routing >>>> >>>>> - Collection per tenant >>>>> >>>>> The below mail says, >>>>> >>>>> >>>>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/ >>>>> >>>> 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E >>>> >>>>> SolrCloud is *more scalable in terms of index size*. Plus you get >>>>> redundancy which can't be underestimated in a hosted solution. >>>>> >>>>> >>>>> AND >>>>> >>>>> The issue is management. 1000s of cores/collections require a level of >>>>> automation. On the other hand, having a single core/collection means if >>>>> you make one change to the schema or solrconfig, it affects everyone. >>>>> >>>>> >>>>> Based on the above facts we think One large collection will be the way >>>>> >>>> to >>> >>>> go. >>>>> >>>>> Questions: >>>>> >>>>> 1. Is that the right way to go? >>>>> 2. Will it be a hassle when we need to do reindexing? >>>>> 3. What is the chance of entire collection crash? (in that case all >>>>> tenants will be affected and reindexing will be painful. >>>>> >>>>> Thank you in advance for your kind opinion. >>>>> >>>>> Best Regards, >>>>> Chamil >>>>> >>>>> -- >>>>> http://kavimalla.blgospot.com >>>>> http://kdchamil.blogspot.com >>>>> >>>> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > -- http://kavimalla.blgospot.com http://kdchamil.blogspot.com