On Wed, Jan 13, 2010 at 05:38:33PM -0500, Paul Rosen wrote:
> Hi all,
>
> The way the indexing works on our system is as follows:
>
> We have a separate "staging" server with a copy of our web app. The clients 
> will index a number of documents in a batch on the staging server (this 
> happens about once a week), then they play with the results on the staging 
> server for a day until satisfied. Only then do they give the ok to deploy.
>
> What I've been doing is, when they want to deploy, I do the following:
>
> 1) merge and optimize the index on the staging server,
>
> 2) copy it to the production server,
>
> 3) stop solr on production,
>
> 4) copy the new index on top of the old one,
>
> 5) start solr on production.
>
> This works, but has the following disadvantages:
>
> 1) The index is getting bigger, so it takes longer to zip it and transfer 
> it.

If you are doing the optimize every time before submitting to production, you
will need to transfer the entire index each time anyway.  To only transfer some
of them you would need to NOT optimize and then use one of the replication
strategies (rsync or Java) to only replicate the deltas.

> 2) The user is only added a few records, yet we copy over all of them. If a 
> bug happens that causes an unrelated document to get deleted or replaced on 
> staging, we wouldn't notice, and we'd propagate the problem to the server. 
> I'd sleep better if I were only moving the records that were new or changed 
> and leaving the records that already work in place.
>
> 3) solr is down on production for about 5 minutes, so users during that 
> time are getting errors.
>
> I was looking for some kind of replication strategy where I can run a task 
> on the production server to tell it to merge a core from the staging 
> server. Is that possible?
>
> I can open up port 8983 on the staging server only to the production 
> server, but then what do I do on production to get the core?

Have you considered using MultiCore approach and some of the commands 
from CoreAdmin[1] and SolrReplication[2]?

Start out with multicore enabled on the production server, and have the
production core running with the name 'prod' or something like that.  

On the staging server, maybe have it in multicore or not.  

Then your deployment procedure would be:

1) On the production server use the CREATE admin command to create a new
   core 'deploy_YYYYMMDD' with configuration from the 'prod' core.   The
   configuration of this core should have replication enabled but with no poll
   interval so replication only happens on demand.

2) Trigger a replication from 'staging' server to the 'deploy_YYYYMMDD' core
   using the replication handler.

3) use the ALIAS core command to add the name 'staging' to the 
   'deploy_YYYYMMDD' core

4) use the SWAP core command to swap the 'staging' and 'prod' cores and make
   sure it all works.  If it doesn't work use SWAP to swap them back.

In the end, you have physical cores with the names 'deploy_YYYYMMDD', or
something else appropriate for your environment,  and those would be the
instanceDir's and such on disk.  Then you have logical core aliases of 'staging'
and 'production' etc.  Sort of like symlinks on the file system.

I have not done a deployment like this yet, just thought about it a few times.
And I have not tested this out to see what, if any, complications there are.

enjoy,

-jeremy

[1] - http://wiki.apache.org/solr/CoreAdmin
[2] - http://wiki.apache.org/solr/SolrReplication

-- 
========================================================================
 Jeremy Hinegardner                              jer...@hinegardner.org 

Reply via email to