On Wed, Oct 15, 2014 at 4:54 PM, Sean Bridges <sean.brid...@gmail.com>
wrote:

> We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like
> repair is significantly more expensive now.  Is this expected?
>

It depends on what you mean by "expected." Operators usually don't expect
defaults with such dramatic impacts to change without them understanding
why, but there is a reason for it.

In 2.0 the default for repair was changed to be non-parallel. To get the
old behavior, you need to supply -par as an argument.

The context-free note you missed the significance of in NEWS.txt for
version 2.0.2 says :

- Nodetool defaults to Sequential mode for repair operations

What this doesn't say is how almost certainly unreasonable this is as a
default, because this means that repair is predictably slower in direct
relationship to your replication factor, and the default for
gc_grace_seconds (the time box in which one must complete a repair) did not
change at the same time. The ticket where the change happens [1] does not
specify a rationale, so your guess is as good as mine as to the reasoning
which not only felt the change was necessary but reasonable.

Leaving aside the problem you've encountered ("upgraders notice that their
repairs (which already took forever) are suddenly WAY SLOWER") this default
is also quite pathological for anyone operating with a RF over 3, which are
valid, if very uncommon, configurations.

In summary, if, as an operator, you disagree that making repair slower by
default as a factor of replication factor is reasonable, I suggest filing a
JIRA and letting the project know. At least in that case there is a chance
they might explain the rationale for so blithely making a change that has
inevitable impact on operators... ?

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5950
http://twitter.com/rcolidba

Reply via email to