If you lose RF + 1 nodes the data that is replicated to only these nodes is gone, good idea to have a recent backup than. Another situation is when you deploy a bug in the software and start writing crap data to Cassandra. Replication does not help and depending on the situation you need to restore the backup.
2013/12/7 Jason Wee <peich...@gmail.com> > Hmm... cassandra fundamental key features like fault tolerant, durable and > replication. Just out of curiousity, why would you want to do backup? > > /Jason > > > On Sat, Dec 7, 2013 at 3:31 AM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Fri, Dec 6, 2013 at 6:41 AM, Amalrik Maia <amal...@s1mbi0se.com.br>wrote: >> >>> hey guys, I'm trying to take backups of a multi-node cassandra and save >>> them on S3. >>> My idea is simply doing ssh to each server and use nodetool to create >>> the snapshots then push then to S3. >>> >> >> https://github.com/synack/tablesnap >> >> So is this approach recommended? my concerns are about inconsistencies >>> that this approach can lead, since the snapshots are taken one by one and >>> not in parallel. >>> Should i worry about it or cassandra finds a way to deal with >>> inconsistencies when doing a restore? >>> >> >> The backup is as consistent as your cluster is at any given moment, which >> is "not necessarily". Manual repair brings you closer to consistency, but >> only on data present when the repair started. >> >> =Rob >> > >