On Wed, Oct 15, 2014 at 4:54 PM, Sean Bridges <sean.brid...@gmail.com> wrote:
> We upgraded a cassandra cluster from 1.2.18 to 2.0.10, and it looks like > repair is significantly more expensive now. Is this expected? > It depends on what you mean by "expected." Operators usually don't expect defaults with such dramatic impacts to change without them understanding why, but there is a reason for it. In 2.0 the default for repair was changed to be non-parallel. To get the old behavior, you need to supply -par as an argument. The context-free note you missed the significance of in NEWS.txt for version 2.0.2 says : - Nodetool defaults to Sequential mode for repair operations What this doesn't say is how almost certainly unreasonable this is as a default, because this means that repair is predictably slower in direct relationship to your replication factor, and the default for gc_grace_seconds (the time box in which one must complete a repair) did not change at the same time. The ticket where the change happens [1] does not specify a rationale, so your guess is as good as mine as to the reasoning which not only felt the change was necessary but reasonable. Leaving aside the problem you've encountered ("upgraders notice that their repairs (which already took forever) are suddenly WAY SLOWER") this default is also quite pathological for anyone operating with a RF over 3, which are valid, if very uncommon, configurations. In summary, if, as an operator, you disagree that making repair slower by default as a factor of replication factor is reasonable, I suggest filing a JIRA and letting the project know. At least in that case there is a chance they might explain the rationale for so blithely making a change that has inevitable impact on operators... ? =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-5950 http://twitter.com/rcolidba