Re: best way to measure repair times?

Ian Rose Thu, 19 Mar 2015 13:38:23 -0700

Thanks Jan, although I'm a bit unsure of the details.  It looks like when
you run a repair this actually occurs over several "sessions".  e.g. in
your example above there are 2 different "repair session [...] finished"
lines.  So does it makes sense that I would want to measure between when I
first see the "Starting repair command..." line until the *last* "repair
session [...] finished" line?  If so, how do I know when I have seen the
last session finish?  Is there a way to know how many sessions there will
be (perhaps 1 per range)?  And how do I correlate session logs to the
repair, since the session logs identify the repair like
"#22f77ad0-cad0-11e4-8f34-77e1731d15ff" whereas the "starting repair" log
identifies it with a much smaller number (e.g. "repair command #2").


- Ian


On Thu, Mar 19, 2015 at 4:03 PM, Jan <cne...@yahoo.com> wrote:

> Ian;
>
> to respond to your specific question:
>
> You could pipe the output of your repair into a file and subsequently
> determine the time taken.
> example:
>
> nodetool repair -dc DC1
> [2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'
> [2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges
>   for keyspace system_traces (seq=true, full=true)
> [2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca
>   for range (820981369067266915,822627736366088177] finished
> [2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca
>   for range (2506042417712465541,2515941262699962473] finished
>
>
> What to look for:
>
> a)  Look for the specific name of the Keyspace & the word 'starting repair'
>
> b)  Look for the word 'finished'.
>
> c)  Compute the average time per keyspace and you would be able to have a 
> rough idea of how long your repairs would take on a regular basis.    This is 
> only for continual operational repair, not the first time its done.
>
>
> hope this helps
>
> Jan/
>
>
>
>
>
>
>   On Thursday, March 19, 2015 12:55 PM, Paulo Motta <
> pauloricard...@gmail.com> wrote:
>
>
> From: http://www.datastax.com/dev/blog/modern-hinted-handoff
> Repair and the fine print
> At first glance, it may appear that Hinted Handoff lets you safely get
> away without needing repair. This is only true if you never have hardware
> failure. Hardware failure means that
>
>    1. We lose “historical” data for which the write has already finished,
>    so there is nothing to tell the rest of the cluster exactly what data has
>    gone missing
>    2. We can also lose hints-not-yet-replayed from requests the failed
>    node coordinated
>
> With sufficient dedication, you can get by with “only run repair after
> hardware failure and rely on hinted handoff the rest of the time,” but as
> your clusters grow (and hardware failure becomes more common) performing
> repair as a one-off special case will become increasingly difficult to do
> perfectly. Thus, we continue to recommend running a full repair weekly.
>
>
> 2015-03-19 16:42 GMT-03:00 Robert Coli <rc...@eventbrite.com>:
>
> On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
> Cassandra doesn't guarantee eventual consistency?
>
>
> If you run regularly scheduled repair, it does. If you do not run repair,
> it does not.
>
> Hinted handoff, for example, is considered an optimization for repair, and
> does not assert that it provides a consistency guarantee.
>
> =Rob
> http://twitter.com/rcolidba
>
>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing
>
> *Royal Institute of Technology - KTH*
> *Instituto Superior Técnico - IST*
> *http://paulormg.com <http://paulormg.com/>*
>
>
>

Re: best way to measure repair times?

Reply via email to