On 12/3/2012 9:47 AM, Andy D'Arcy Jewell wrote:
However, wouldn't re-creating the index on a large dataset take an inordinate amount of time? The system I will be backing up is likely to undergo rapid development and thus schema changes, so I need some kind of insurance against corruption if we need to roll-back after a change.

How should I go about creating multiplebackup versions I can put aside (e.g. on tape) to hedge against the down-time which would be required to regenerate the indexes from scratch?

Serious production Solr installs require at least two copies of your index. Failures *will* happen, and sometimes they'll be the kind of failures that will take down an entire machine. You can plan for some failures -- redundant power supply and RAID are important for this. Some failures will cause downtime, though -- multiple disk failures, motherboard, CPU, memory, software problems wiping out your index, user error, etc.If you have at least one other copy of your index, you'll be able to keep the system operational while you fix the down machine.

Replication is a very good way to accomplish getting two or more copies of your index. I would expect that most production Solr installations use either plain replication or SolrCloud. I do my redundancy a different way that gives me a lot more flexibility, but replication is a VERY solid way to go.

If you are running on a UNIX/Linux platform (just about anything *other* than Windows), and backups via replication are not enough for you, you can use the hardlink capability in the OS to avoid taking Solr down while you make backups. Here's the basic sequence:

1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target directory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation. The way that Solr/Lucene works will guarantee that your hardlink copy will continue to be a valid index snapshot no matter what happens to the live index. If you can make the backup and get the hardlinks deleted before your index undergoes a merge, the hardlinks will use very little extra disk space.

If you leave the hardlink copies around, eventually your live index will diverge to the point where the copy has different files and therefore takes up disk space. If you have a *LOT* of extra disk space on the Solr server, you can keep multiple hardlink copies around as snapshots.

Recent versions of Windows do have features similar to UNIX links, so there may in fact be a way to do this on Windows. I will leave that for someone else to pursue.

Thanks,
Shawn

Reply via email to