Nick, I’m working with many small databases and I haven’t noticed any problems with the default rsync behavior (I use options -avHS for archive mode, verbose, preserve hardlinks, and sparse file efficient). I have a couple of large databases that are also backed up this way and I haven’t noticed any performance problems.
Whatever option you choose, I think it’s really important to choose a simple/primitive non-CouchDB backup solution in case your specialized user of CouchDB has unforeseen side effects. Using a hosted service that is isolated from your current hosting solution is also important for the same reasons. Be conservative when it comes to safeguarding data. — Paul Okstad http://pokstad.com > On Mar 10, 2016, at 2:23 PM, Nick Wood <[email protected]> wrote: > > Thanks for the suggestions. > > @Paul, I did some basic testing with rsync. It seems like the checksuming > isn't super efficient if I'm trying to maintain up-to-the-minute backups of > large databases (> 1GB) because the whole file has to be read to checksum > it. I could just append without checksuming but that doesn't feel safe. Can > I ask if you're having rsync do checksums/verifying or if you're just > appending and if you've ever had data corruption issues? > > @Jan, giving that a shot. So far so good. The backups aren't quite as > current as they would be if continuous replications would work, but this > seems more efficient than rsync. Might have a winner if the crashes stay > away. > > Nick > > On Thu, Mar 10, 2016 at 1:18 AM, Jan Lehnardt <[email protected]> wrote: > >> >>> On 09 Mar 2016, at 21:29, Nick Wood <[email protected]> wrote: >>> >>> Hello, >>> >>> I'm looking to back up a CouchDB server with multiple databases. >> Currently >>> 1,400, but it fluctuates up and down throughout the day as new databases >>> are added and old ones deleted. ~10% of the databases are written to >> within >>> any 5 minute period of time. >>> >>> Goals >>> - Maintain a continual off-site snapshot of all databases, preferably no >>> older than a few seconds (or minutes) >>> - Be efficient with bandwidth (i.e. not copy the whole database file for >>> every backup run) >>> >>> My current solution watches the global _changes feed and fires up a >>> continuous replication to an off-site server whenever it sees a change. >> If >>> it doesn't see a change from a database for 10 minutes, it kills that >>> replication. This means I only have ~150 active replications running on >>> average at any given time. >> >> How about instead of using continuous replications and killing them, >> use non-continuous replications based on _db_updates? They end >> automatically and should use fewer resources then. >> >> Best >> Jan >> -- >> >> >> >>> >>> I thought this was a pretty clever approach, but I can't stabilize it. >>> Replications hang frequently with crashes in the log file. I haven't yet >>> tracked down the source of the crashes. I'm running the official 1.6.1 >>> docker image as of yesterday so I don't think it would be an erlang >> issue. >>> >>> Rather than keep banging my head against these stability issues, I >> thought >>> I'd ask to see if anyone else has come up with a clever backup solution >>> that meets the above goals? >>> >>> Nick >> >> -- >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >> >>
