Re: Backing up SolR 4.0

Andy D'Arcy Jewell Tue, 04 Dec 2012 00:55:46 -0800

On 03/12/12 18:04, Shawn Heisey wrote:

Serious production Solr installs require at least two copies of yourindex. Failures *will* happen, and sometimes they'll be the kind offailures that will take down an entire machine. You can plan for somefailures -- redundant power supply and RAID are important for this.Some failures will cause downtime, though -- multiple disk failures,motherboard, CPU, memory, software problems wiping out your index,user error, etc.If you have at least one other copy of your index,you'll be able to keep the system operational while you fix the downmachine.
Replication is a very good way to accomplish getting two or morecopies of your index. I would expect that most production Solrinstallations use either plain replication or SolrCloud. I do myredundancy a different way that gives me a lot more flexibility, butreplication is a VERY solid way to go.
If you are running on a UNIX/Linux platform (just about anything*other* than Windows), and backups via replication are not enough foryou, you can use the hardlink capability in the OS to avoid takingSolr down while you make backups. Here's the basic sequence:
1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the targetdirectory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.
Making hardlinks is a near-instantaneous operation. The way thatSolr/Lucene works will guarantee that your hardlink copy will continueto be a valid index snapshot no matter what happens to the liveindex. If you can make the backup and get the hardlinks deletedbefore your index undergoes a merge, the hardlinks will use verylittle extra disk space.
If you leave the hardlink copies around, eventually your live indexwill diverge to the point where the copy has different files andtherefore takes up disk space. If you have a *LOT* of extra diskspace on the Solr server, you can keep multiple hardlink copies aroundas snapshots.
Recent versions of Windows do have features similar to UNIX links, sothere may in fact be a way to do this on Windows. I will leave thatfor someone else to pursue.
Thanks,
Shawn

Thanks Shawn, that's very informative. I get twitchy with anything whereyou "can't" back it up (memcached excepted). As an administrator, it'smy job to recover from failures, and backups are kind of my comfort blanket.

I'm running on Linux (on Debian Squeeze) in a fully virtualenvironment. Initially, I think I'll have to just schedule the backupfor the early hours (local time) but as we grow, I can see I'll have touse replication to do it seamlessly. The system is necessarily smallright now, as we haven't yet gone live, butwe are anticipating rapidgrowth, so replication has always been on the cards.

Is there an easy way to tell (say from a shell script) when "all commitsand merges [are] complete"?

If I keep a replica solely for backup purposes, I assume I can "do whatI like with it" - presumably replication will resume/catch-up when Iresume it (I admit, I have a bit of reading to do wrt replication - Ijust skimmed that because it wasn't in my initial brief).

I'm assuming that because you're using hardlinks, that means that SolRwrites a "new" file when it updates (sortof copy-on-write style)? So weare relying on the principle that as long as you have at least oneremaining reference to the data, it's not deleted...


Thanks once again!

-Andy



--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk

Re: Backing up SolR 4.0

Reply via email to