Re: Backing up SolR 4.0

Shawn Heisey Mon, 03 Dec 2012 10:05:27 -0800

On 12/3/2012 9:47 AM, Andy D'Arcy Jewell wrote:

However, wouldn't re-creating the index on a large dataset take aninordinate amount of time? The system I will be backing up is likelyto undergo rapid development and thus schema changes, so I need somekind of insurance against corruption if we need to roll-back after achange.
How should I go about creating multiplebackup versions I can put aside(e.g. on tape) to hedge against the down-time which would be requiredto regenerate the indexes from scratch?

Serious production Solr installs require at least two copies of yourindex. Failures *will* happen, and sometimes they'll be the kind offailures that will take down an entire machine. You can plan for somefailures -- redundant power supply and RAID are important for this.Some failures will cause downtime, though -- multiple disk failures,motherboard, CPU, memory, software problems wiping out your index, usererror, etc.If you have at least one other copy of your index, you'll beable to keep the system operational while you fix the down machine.

Replication is a very good way to accomplish getting two or more copiesof your index. I would expect that most production Solr installationsuse either plain replication or SolrCloud. I do my redundancy adifferent way that gives me a lot more flexibility, but replication is aVERY solid way to go.

If you are running on a UNIX/Linux platform (just about anything *other*than Windows), and backups via replication are not enough for you, youcan use the hardlink capability in the OS to avoid taking Solr downwhile you make backups. Here's the basic sequence:


1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target directory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation. The way thatSolr/Lucene works will guarantee that your hardlink copy will continueto be a valid index snapshot no matter what happens to the live index.If you can make the backup and get the hardlinks deleted before yourindex undergoes a merge, the hardlinks will use very little extra diskspace.

If you leave the hardlink copies around, eventually your live index willdiverge to the point where the copy has different files and thereforetakes up disk space. If you have a *LOT* of extra disk space on theSolr server, you can keep multiple hardlink copies around as snapshots.

Recent versions of Windows do have features similar to UNIX links, sothere may in fact be a way to do this on Windows. I will leave that forsomeone else to pursue.


Thanks,
Shawn

Re: Backing up SolR 4.0

Reply via email to