Thanks Jan, although I'm a bit unsure of the details. It looks like when you run a repair this actually occurs over several "sessions". e.g. in your example above there are 2 different "repair session [...] finished" lines. So does it makes sense that I would want to measure between when I first see the "Starting repair command..." line until the *last* "repair session [...] finished" line? If so, how do I know when I have seen the last session finish? Is there a way to know how many sessions there will be (perhaps 1 per range)? And how do I correlate session logs to the repair, since the session logs identify the repair like "#22f77ad0-cad0-11e4-8f34-77e1731d15ff" whereas the "starting repair" log identifies it with a much smaller number (e.g. "repair command #2").
- Ian On Thu, Mar 19, 2015 at 4:03 PM, Jan <cne...@yahoo.com> wrote: > Ian; > > to respond to your specific question: > > You could pipe the output of your repair into a file and subsequently > determine the time taken. > example: > > nodetool repair -dc DC1 > [2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system' > [2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges > for keyspace system_traces (seq=true, full=true) > [2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca > for range (820981369067266915,822627736366088177] finished > [2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca > for range (2506042417712465541,2515941262699962473] finished > > > What to look for: > > a) Look for the specific name of the Keyspace & the word 'starting repair' > > b) Look for the word 'finished'. > > c) Compute the average time per keyspace and you would be able to have a > rough idea of how long your repairs would take on a regular basis. This is > only for continual operational repair, not the first time its done. > > > hope this helps > > Jan/ > > > > > > > On Thursday, March 19, 2015 12:55 PM, Paulo Motta < > pauloricard...@gmail.com> wrote: > > > From: http://www.datastax.com/dev/blog/modern-hinted-handoff > Repair and the fine print > At first glance, it may appear that Hinted Handoff lets you safely get > away without needing repair. This is only true if you never have hardware > failure. Hardware failure means that > > 1. We lose “historical” data for which the write has already finished, > so there is nothing to tell the rest of the cluster exactly what data has > gone missing > 2. We can also lose hints-not-yet-replayed from requests the failed > node coordinated > > With sufficient dedication, you can get by with “only run repair after > hardware failure and rely on hinted handoff the rest of the time,” but as > your clusters grow (and hardware failure becomes more common) performing > repair as a one-off special case will become increasingly difficult to do > perfectly. Thus, we continue to recommend running a full repair weekly. > > > 2015-03-19 16:42 GMT-03:00 Robert Coli <rc...@eventbrite.com>: > > On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > Cassandra doesn't guarantee eventual consistency? > > > If you run regularly scheduled repair, it does. If you do not run repair, > it does not. > > Hinted handoff, for example, is considered an optimization for repair, and > does not assert that it provides a consistency guarantee. > > =Rob > http://twitter.com/rcolidba > > > > > -- > Paulo Ricardo > > -- > European Master in Distributed Computing > > *Royal Institute of Technology - KTH* > *Instituto Superior Técnico - IST* > *http://paulormg.com <http://paulormg.com/>* > > >