Re: Questions about anti-entropy repair

Ryan Svihla Fri, 22 Jul 2016 11:37:29 -0700

I would say only repairing when there is a known problem has a couple of 
logical issues off the top of my head:


1. you're assuming hints are successfully delivering within their time window. 
There isn't really any indication that I've ever found myself.
2. unless you're using CL ALL you really have no indication if the other 
replicas not needed didn't succeed a write on the initial attempt.

Now if you're using CL LOCAL_QUORUM you'll have reasonable consistency and 
chances are pretty good that you eventually hit your RF anyway with 
read_repair, so I get the thought process behind what you're saying Daemeon.

Likewise, I've seen well sized clusters with steady good workloads in general 
behave pretty well and not need to stream a lot of data during repair, but 
because of 1 and 2 even with good monitoring that's a bit "running with 
scissors" for my taste as I'm not confident there is enough monitoring coverage 
that'll ever guarantee you're "mostly meeting RF" or not.

Running repair within gc_grace_seconds should be something you can handle 
anyway with your workload or you're not sized correctly (else what happens when 
you need to run repair after a major event?), so why not just keep it running.

YMMV and if someone has kept their cluster up and running and know all the 
stuff to look for Kudos. I still view it as a cheap cost to CYA and even 
working with Cassandra now for 3 years in a wide variety of pretty crazy 
situations I'm not confident I could keep a cluster healthy without running 
repair consistently.

regards,

Ryan Svihla

On Jul 20, 2016, 10:32 AM -0500, daemeon reiydelle <daeme...@gmail.com>, wrote:
> I don't know if my perspective on this will assist, so YMMV:
>
> Summary
> Nodetool repairs are required when a node has issues and can't get its (e.g. 
> hinted handoff) resync done: culprit: usually network, sometimes 
> container/vm, rarely disk.
> Scripts to do partition range are a pain to maintain, and you have to be 
> CONSTANTLY checking for new keyspaces, parsing them, etc. Git hub project?
> Monitor/monitor/monitor: if you do a best practices job of actually 
> monitoring the FULL stack, you only need to do repairs when the world goes 
> south.
> Are you alerted when errors show up in the logs, network goes wacky, etc? No? 
> then you have to CYA by doing hail mary passes with periodic nodetool repairs.
> Nodetool repair is a CYA for a cluster whose status is not well monitored.
> Daemeon's thoughts:
>
> Nodetool repair is not required for a cluster that is and "always has been" 
> in a known good state. Monitoring of the relevant logs/network/disk/etc. is 
> the only way that I know of to assure this state. Because (e.g. AWS, and 
> EVERY ONE OF my clients' infrastructures: screwed up networks) nodes can 
> disappear then the cluster *can* get overloaded (network traffic) causing 
> hinted handoffs to have all of the worst case corner cases you can never hope 
> to see.
>
> So, if you have good monitoring in place to assure that there is known good 
> cluster behaviour (network, disk, etc.), repairs are not required until you 
> are alerted that a cluster health problem has occurred. Partition range 
> repair is a pain in various parts of the anatomy because one has to 
> CONSTANTLY be updating the scripts that generate the commands (I have not 
> seen a git hub project around this, would love to see responses that point 
> them out!).
>
>
>
> .......
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 (tel:(+1)%20415.501.0198)
> London (+44) (0) 20 8144 9872 (tel:(+44)%2020%208144%209872)
>
> On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ <arodr...@gmail.com 
> (mailto:arodr...@gmail.com)> wrote:
> > Hi Satoshi,
> >
> > > Q1:
> > > According to the DataStax document, it's recommended to run full repair 
> > > weekly or monthly. Is it needed even if repair with partitioner range 
> > > option ("nodetool repair -pr", in C* v2.2+) is set to run periodically 
> > > for every node in the cluster?
> >
> >
> > More accurately you need to run a repair for each node and each table 
> > within the gc_grace_seconds value defined at the table level to ensure no 
> > deleted data will return. Also running this on a regular basis ensure a 
> > constantly low entropy in your cluster, allowing better consistency (if not 
> > using a strong consistency like with CL.R&W = quorum).
> >
> > A full repair means every piece of data have been repaired. On a 3 node 
> > cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or 
> > 'nodetool repair' on one node are an equivalent "full repair". The best 
> > approach is often to run repair with '-pr' on all the nodes indeed. This is 
> > a full repair.
> >
> > > Is it a good practice to repair a node without using non-repaired 
> > > snapshots when I want to restore a node because repair process is too 
> > > slow?
> >
> > I am sorry, this is unclear to me. But from this "actually 1GB data is 
> > updated because the snapshot is already repaired" I understand you are 
> > using incremental repairs (or that you think that Cassandra repair uses it 
> > by default, which is not the case in your version). 
> > http://www.datastax.com/dev/blog/more-efficient-repairs
> >
> > Also, be aware that repair is a PITA for all the operators using Cassandra, 
> > that lead to many tries to improve things:
> >
> > Range repair: https://github.com/BrianGallew/cassandra_range_repair
> > Reaper: https://github.com/spotify/cassandra-reaper
> > Ticket to automatically schedule / handle repairs in Cassandra: 
> > https://issues.apache.org/jira/browse/CASSANDRA-10070
> > Ticket to switch to Mutation Based Repairs (MBR): 
> > https://issues.apache.org/jira/browse/CASSANDRA-8911
> >
> > And probably many more... There is a lot to read and try, repair is an 
> > important yet non trivial topic for any Cassandra operator.
> >
> > C*heers,
> > -----------------------
> > Alain Rodriguez - al...@thelastpickle.com (mailto:al...@thelastpickle.com)
> > France
> >
> > The Last Pickle - Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> >
> >
> >
> >
> > 2016-07-14 9:41 GMT+02:00 Satoshi Hikida <sahik...@gmail.com 
> > (mailto:sahik...@gmail.com)>:
> > > Hi,
> > >
> > > I have two questions about anti-entropy repair.
> > >
> > > Q1:
> > > According to the DataStax document, it's recommended to run full repair 
> > > weekly or monthly. Is it needed even if repair with partitioner range 
> > > option ("nodetool repair -pr", in C* v2.2+) is set to run periodically 
> > > for every node in the cluster?
> > >
> > > References:
> > > - DataStax, "When to run anti-entropy repair", 
> > > http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
> > >
> > >
> > > Q2:
> > > Is it a good practice to repair a node without using non-repaired 
> > > snapshots when I want to restore a node because repair process is too 
> > > slow?
> > >
> > > I've done some simple verifications for anti-entropy repair and found out 
> > > that the repair process spends too much time than simply transferring the 
> > > replica data from existing nodes to restoring node.
> > >
> > > My verification settings are as following:
> > >
> > > - 3 node cluster (N1, N2, N3)
> > > - 2 CPUs, 8GB memory, 500GB HDD for each node
> > > - Replication Factor is 3
> > > - C* version is 2.2.6
> > > - CS is LCS
> > >
> > > And I prepared test data as following:
> > >
> > > - a snapshot (10GB, full repaired) for N1, N2, N3.
> > > - 1GB SSTables (by using incremental backup) for N1, N2, N3.
> > > - another 1GB SSTables for N1, N2
> > >
> > > I've measured repair time for two cases.
> > >
> > > - Case 1: repair N3 with the snapshot and 1GB SStables
> > > - Case 2: repair N3 with the snapshot only
> > >
> > > In case 1, N3 is needed to repair 12GB (actually 1GB data is updated 
> > > because the snapshot is already repaired) and received 1GB data from N1 
> > > or N2. Whereas in case 2, N3 is needed to repair 12GB (actually just 
> > > compare merkle tree for 10GB) and received 2GB data from N1 or N2.
> > >
> > > The result showed that case 2 was faster than case 1 (case 1: 6889sec, 
> > > case 2: 4535sec). I guess the repair process is very slow and it would be 
> > > better to repair a node without (non repaired) backed up (snapshot or 
> > > incremental backup) files if the other replica nodes exists.
> > >
> > > So... I guess if I just have non-repaired backups, what's the point of 
> > > using them? Looks like there's no merit... Am I missing something?
> > >
> > > Regards,
> > > Satoshi
> > >
> >
> >
> >
>

Re: Questions about anti-entropy repair

Reply via email to