Hi, I am looking into Solr 4.7 for best practice of multi-tenancy support. Our use cases require support of thousands of tenants (say 10,000) and the incoming data rate could be more than 10k documents per second. I did some research and found people talked about scaling tenants at all four levels:
Solr Cloud Collection Shard Core I am listing them plus some quoted comments from the links. 1) Solr Cloud and Collection http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b ----------- Are you trying to do "multi-tenant"? If so, you should be talking "multi-cluster" where you externally manage your "tenants", assigning them to clusters, but keeping tenants per cluster down in the dozens/hundreds, and "archiving" inactive tenants and spinning up (and down) clusters as inactive tenants become active or fall into inactivity. But keeping 1,000 or more tenants active in a single cluster as separate collections is... a no-go. ----------- 2) Shard http://searchhub.org/2013/06/13/solr-cloud-document-routing/ ----------- Document routing can be used to achieve a more efficient multi-tenant environment. This can be done by making the tenant id the shard key, which would group all documents from the same tenant on the same shard. ----------- 3) Core http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9 ----------- Every multitenant situation is going to be different, but at the extreme a single core per tenant is the cleanest and provides the best separation, optimal performance, and supports full tf-idf relevancy of document fields for each tenant. ----------- http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83 ----------- Well, we try to use Solr to run a multi-tenant index/search service. We assigns each client a different core with their own config and schema. It would be good for us if we can just let the customer to be able to create cores with their own schema and config. ----------- I also saw slides talking about scaling time along Collection: timed collections (slides 50 ~ 58) http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs According to these, I am thinking about the following approach: In a single Solr Cloud, the multi-tenant support is at Core level (one or more cores per tenant), and for better performance, will create a collection every day. When a tenant grows too big, will migrate it from this Solr cloud to a new Solr Cloud. Any potential issue with this approach? Is there better approach based on your experience? A few questions related to proposed approach: 1) When a core is replicated to multiple nodes via multiple shards, the query submitted against a particular core (tenant) should be executed distributed, right? 2) What is the best way to move a core from one Solr Cloud to another? 3) If we create one collection per day and want to keep data for three years for example, is it OK to have so many collections? If yes, is it cheap to maintain the collection alias for easy querying? Thanks. Shushuai