Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh
On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne wrote: >> It's in the 1.1 branch; I don't remember if it went into a release >> yet. If not, it'll be in the next 1.1.x release. > > As the ticket says, this is in since 1.1.1. I don't pretend this is > well documented, but it's in. > Nope. It is i

Re: Why data tripled in size after repair?

2012-10-02 Thread Sylvain Lebresne
> It's in the 1.1 branch; I don't remember if it went into a release > yet. If not, it'll be in the next 1.1.x release. As the ticket says, this is in since 1.1.1. I don't pretend this is well documented, but it's in. -- Sylvain

Re: Why data tripled in size after repair?

2012-10-01 Thread Peter Schuller
> It looks like what I need. Couple questions. > Does it work with RandomPartinioner only? I use ByteOrderedPartitioner. I believe it should work with BOP based on cursory re-examination of the patch. I could be wrong. > I don't see it as part of any release. Am I supposed to build my own > versi

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 12:36 PM, Peter Schuller wrote: >> What is strange every time I run repair data takes almost 3 times more >> - 270G, then I run compaction and get 100G back. > > https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the > maion issues with repair. In short - in your

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
> I see. It explains why I get 85G + 85G instead of 90G. But after next > repair I have six extra files 75G each, > how is it possible? Maybe you've run repair on other nodes? Basically repair is a fairly blind process. If it consider that a given range (and by range I mean here the ones that repa

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne wrote: >> I don't understand why it copied data twice. In worst case scenario it >> should copy everything (~90G) > > Sadly no, repair is currently peer-to-peer based (there is a ticket to > fix it: https://issues.apache.org/jira/browse/CASSANDRA-3

Re: Why data tripled in size after repair?

2012-09-27 Thread Sylvain Lebresne
> I don't understand why it copied data twice. In worst case scenario it > should copy everything (~90G) Sadly no, repair is currently peer-to-peer based (there is a ticket to fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but that's not trivial). This mean that you can end up with

Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli wrote: > On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh wrote: >> [ repair ballooned my data size ] >> 1. Why repair almost triples data size? > > You didn't mention what version of cassandra you're running. In some > old versions of cassandra (prior to

Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller
> What is strange every time I run repair data takes almost 3 times more > - 270G, then I run compaction and get 100G back. https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the maion issues with repair. In short - in your case the limited granularity of merkle trees is causing too muc

Re: Why data tripled in size after repair?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh wrote: > [ repair ballooned my data size ] > 1. Why repair almost triples data size? You didn't mention what version of cassandra you're running. In some old versions of cassandra (prior to 1.0), repair often creates even more extraneous data than i

Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
Hello everybody! I have 3 node cluster with replication factor of 3. each node has 800G disk and it used to have 100G of data. What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. Unfortunately, yesterday I forget to compact and run