Apple did a preso on massive multi-tenancy. I haven’t watched it yet, but it might help.
https://www.youtube.com/watch?v=_Erkln5WWLw <https://www.youtube.com/watch?v=_Erkln5WWLw> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 27, 2016, at 10:02 PM, Chamil Jeewantha <kdcha...@gmail.com> wrote: > > Thank you everyone for your great support. > > I will update you with our final approach. > > Best regards, > Chamil > > On Aug 28, 2016 01:34, "John Bickerstaff" <j...@johnbickerstaff.com> wrote: > >> In my own work, the risk to the business if every single client cannot >> access search is so great, we would never consider putting everything in >> one. You should certainly ask that question of the business stakeholders >> before you decide. >> >> For that reason, I might recommend that each of the multiple collections >> suggested above by Erick could also be on a separate SolrCloud (or single >> Solr instance) so that no single failure can ever take down every tenant's >> ability to search -- only those on that particular SolrCloud... >> >> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >>> There's no one right answer here. I've also seen a hybrid approach >>> where there are multiple collections each of which has some >>> number of tenants resident. Eventually, you need to think of some >>> kind of partitioning, my rough number of documents for a single core >>> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core). >>> >>> All that said, you may also be interested in the "transient cores" >>> option, see: https://cwiki.apache.org/confluence/display/solr/ >>> Defining+core.properties >>> and the transient and transientCacheSize (this latter in solr.xml). Note >>> that this is stand-alone only so you can't move that concept to >>> SolrCloud if you eventually go there. >>> >>> Best, >>> Erick >>> >>> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha <kdcha...@gmail.com> >>> wrote: >>>> Dear Solr Members, >>>> >>>> We are using SolrCloud as the search provider of a multi-tenant cloud >>> based >>>> application. We have one schema for all the tenants. The indexes will >>> have >>>> large number(millions) of documents. >>>> >>>> As of our research, we have two options, >>>> >>>> - One large collection for all the tenants and use Composite-ID >>> routing >>>> - Collection per tenant >>>> >>>> The below mail says, >>>> >>>> >>>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/ >>> 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E >>>> >>>> SolrCloud is *more scalable in terms of index size*. Plus you get >>>> redundancy which can't be underestimated in a hosted solution. >>>> >>>> >>>> AND >>>> >>>> The issue is management. 1000s of cores/collections require a level of >>>> automation. On the other hand, having a single core/collection means if >>>> you make one change to the schema or solrconfig, it affects everyone. >>>> >>>> >>>> Based on the above facts we think One large collection will be the way >> to >>>> go. >>>> >>>> Questions: >>>> >>>> 1. Is that the right way to go? >>>> 2. Will it be a hassle when we need to do reindexing? >>>> 3. What is the chance of entire collection crash? (in that case all >>>> tenants will be affected and reindexing will be painful. >>>> >>>> Thank you in advance for your kind opinion. >>>> >>>> Best Regards, >>>> Chamil >>>> >>>> -- >>>> http://kavimalla.blgospot.com >>>> http://kdchamil.blogspot.com >>> >>