On Wed, Jan 13, 2010 at 05:38:33PM -0500, Paul Rosen wrote: > Hi all, > > The way the indexing works on our system is as follows: > > We have a separate "staging" server with a copy of our web app. The clients > will index a number of documents in a batch on the staging server (this > happens about once a week), then they play with the results on the staging > server for a day until satisfied. Only then do they give the ok to deploy. > > What I've been doing is, when they want to deploy, I do the following: > > 1) merge and optimize the index on the staging server, > > 2) copy it to the production server, > > 3) stop solr on production, > > 4) copy the new index on top of the old one, > > 5) start solr on production. > > This works, but has the following disadvantages: > > 1) The index is getting bigger, so it takes longer to zip it and transfer > it.
If you are doing the optimize every time before submitting to production, you will need to transfer the entire index each time anyway. To only transfer some of them you would need to NOT optimize and then use one of the replication strategies (rsync or Java) to only replicate the deltas. > 2) The user is only added a few records, yet we copy over all of them. If a > bug happens that causes an unrelated document to get deleted or replaced on > staging, we wouldn't notice, and we'd propagate the problem to the server. > I'd sleep better if I were only moving the records that were new or changed > and leaving the records that already work in place. > > 3) solr is down on production for about 5 minutes, so users during that > time are getting errors. > > I was looking for some kind of replication strategy where I can run a task > on the production server to tell it to merge a core from the staging > server. Is that possible? > > I can open up port 8983 on the staging server only to the production > server, but then what do I do on production to get the core? Have you considered using MultiCore approach and some of the commands from CoreAdmin[1] and SolrReplication[2]? Start out with multicore enabled on the production server, and have the production core running with the name 'prod' or something like that. On the staging server, maybe have it in multicore or not. Then your deployment procedure would be: 1) On the production server use the CREATE admin command to create a new core 'deploy_YYYYMMDD' with configuration from the 'prod' core. The configuration of this core should have replication enabled but with no poll interval so replication only happens on demand. 2) Trigger a replication from 'staging' server to the 'deploy_YYYYMMDD' core using the replication handler. 3) use the ALIAS core command to add the name 'staging' to the 'deploy_YYYYMMDD' core 4) use the SWAP core command to swap the 'staging' and 'prod' cores and make sure it all works. If it doesn't work use SWAP to swap them back. In the end, you have physical cores with the names 'deploy_YYYYMMDD', or something else appropriate for your environment, and those would be the instanceDir's and such on disk. Then you have logical core aliases of 'staging' and 'production' etc. Sort of like symlinks on the file system. I have not done a deployment like this yet, just thought about it a few times. And I have not tested this out to see what, if any, complications there are. enjoy, -jeremy [1] - http://wiki.apache.org/solr/CoreAdmin [2] - http://wiki.apache.org/solr/SolrReplication -- ======================================================================== Jeremy Hinegardner jer...@hinegardner.org