Re: Why data tripled in size after repair?

Rob Coli Wed, 26 Sep 2012 11:07:31 -0700

On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh <ailin...@gmail.com> wrote:
> [ repair ballooned my data size ]
> 1. Why repair almost triples data size?


You didn't mention what version of cassandra you're running. In some
old versions of cassandra (prior to 1.0), repair often creates even
more extraneous data than it should by design.

However, by design, Repair repairs differing ranges based on merkle
trees. Merkle trees are an optimization, what you trade for the
optimization is over-repair. When you have multiple replicas, each
over-repairs. If you are running repair on your whole cluster, this is
why you should use repair -pr, as it reduces the per-replica
over-repair.

> 2. How to compact my data back to 100G?

1) do a major compaction, one CF at a time. if you only have one CF,
you're out of luck because you don't have enough headroom.
2) then convince someone to write "sstablesplit" so you can turn your
100G sstable into [n] smaller sstables and/or learn to live with your
giant sstable

Or add a new data directory with more space in it, to allow you to
compact. I mention the latter in case it is trivial to attach
additional storage in your env.

The other alternative is to wait. Most space will be reclaimed over
time by minor compaction.

=Rob

-- 
=Robert Coli
AIM&GTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: Why data tripled in size after repair?

Reply via email to