On 03/12/12 18:04, Shawn Heisey wrote:

Serious production Solr installs require at least two copies of your index. Failures *will* happen, and sometimes they'll be the kind of failures that will take down an entire machine. You can plan for some failures -- redundant power supply and RAID are important for this. Some failures will cause downtime, though -- multiple disk failures, motherboard, CPU, memory, software problems wiping out your index, user error, etc.If you have at least one other copy of your index, you'll be able to keep the system operational while you fix the down machine.

Replication is a very good way to accomplish getting two or more copies of your index. I would expect that most production Solr installations use either plain replication or SolrCloud. I do my redundancy a different way that gives me a lot more flexibility, but replication is a VERY solid way to go.

If you are running on a UNIX/Linux platform (just about anything *other* than Windows), and backups via replication are not enough for you, you can use the hardlink capability in the OS to avoid taking Solr down while you make backups. Here's the basic sequence:

1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target directory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation. The way that Solr/Lucene works will guarantee that your hardlink copy will continue to be a valid index snapshot no matter what happens to the live index. If you can make the backup and get the hardlinks deleted before your index undergoes a merge, the hardlinks will use very little extra disk space.

If you leave the hardlink copies around, eventually your live index will diverge to the point where the copy has different files and therefore takes up disk space. If you have a *LOT* of extra disk space on the Solr server, you can keep multiple hardlink copies around as snapshots.

Recent versions of Windows do have features similar to UNIX links, so there may in fact be a way to do this on Windows. I will leave that for someone else to pursue.

Thanks,
Shawn

Thanks Shawn, that's very informative. I get twitchy with anything where you "can't" back it up (memcached excepted). As an administrator, it's my job to recover from failures, and backups are kind of my comfort blanket.

I'm running on Linux (on Debian Squeeze) in a fully virtual environment. Initially, I think I'll have to just schedule the backup for the early hours (local time) but as we grow, I can see I'll have to use replication to do it seamlessly. The system is necessarily small right now, as we haven't yet gone live, butwe are anticipating rapid growth, so replication has always been on the cards.

Is there an easy way to tell (say from a shell script) when "all commits and merges [are] complete"?

If I keep a replica solely for backup purposes, I assume I can "do what I like with it" - presumably replication will resume/catch-up when I resume it (I admit, I have a bit of reading to do wrt replication - I just skimmed that because it wasn't in my initial brief).

I'm assuming that because you're using hardlinks, that means that SolR writes a "new" file when it updates (sortof copy-on-write style)? So we are relying on the principle that as long as you have at least one remaining reference to the data, it's not deleted...

Thanks once again!

-Andy



--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Reply via email to