On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh <ailin...@gmail.com> wrote: > [ repair ballooned my data size ] > 1. Why repair almost triples data size?
You didn't mention what version of cassandra you're running. In some old versions of cassandra (prior to 1.0), repair often creates even more extraneous data than it should by design. However, by design, Repair repairs differing ranges based on merkle trees. Merkle trees are an optimization, what you trade for the optimization is over-repair. When you have multiple replicas, each over-repairs. If you are running repair on your whole cluster, this is why you should use repair -pr, as it reduces the per-replica over-repair. > 2. How to compact my data back to 100G? 1) do a major compaction, one CF at a time. if you only have one CF, you're out of luck because you don't have enough headroom. 2) then convince someone to write "sstablesplit" so you can turn your 100G sstable into [n] smaller sstables and/or learn to live with your giant sstable Or add a new data directory with more space in it, to allow you to compact. I mention the latter in case it is trivial to attach additional storage in your env. The other alternative is to wait. Most space will be reclaimed over time by minor compaction. =Rob -- =Robert Coli AIM>ALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb