Hello Jan,

Thank you very much for the answer. Unfortunately, we don't use Amazon, and I doubt we will be able to persuade the customer to switch to it. Moreover, the amount of data will not allow us to store everything on a single master. However, having considered your design I am starting to see the problem in a new light, so maybe it will still prove helpful ;)

In the meanwhile, I'm still looking for other solutions...

Best regards,
Sergey Sazonov.

On 05/05/11 15:07, Jan Høydahl wrote:
Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named "jan", "feb", "mar" etc
* Every month, you clear the current month index and switch indexing to it
   You will only have one master, because you're only indexing to one month at 
a time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr 
replica pointing to its master
   This way, Amazon will spin off replicas as needed
   NOTE: Your replica could still be located at /solr/select even if it 
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one 
or more shards
   
&shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:

Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.

And now to the main topic: I would like to learn whether it is possible to 
restructure a Solr cloud programmatically.

Let me describe the system we are designing to make the requirements clear. The 
indexed documents are certain log entries. We are planning to shard them by 
month, and only keep the last 12 months in the index. We are going to replicate 
each shard across several servers.

Now, the user is always required to search within a single month (= shard). Most 
importantly, we expect an absolute majority of the requests to query the current month, 
with only a minor load on the previous months. In order to utilise the cluster most 
efficiently, we would like a majority of the servers to contain replicas of the current 
month data, and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that "migrate" from master to master, 
depending on which master holds the data for the current month. When a new month starts, 
those slaves have to be reconfigured to hold the new shard and to replicate from the new 
master (their old master now holding the data for the previous month).

Since this operation has to be done every month, we are naturally considering 
automating it. So my question is whether anyone has faced a similar problem 
before, and what is the best way to solve it. We are not committed to any 
solution, or even architecture, so feel free to propose different solutions. 
The only requirement is that a majority of the servers should be able to serve 
requests to the current month at any given moment.

Thank you in advance for your answers.

Best regards,
Sergey Sazonov.

Reply via email to