This thread is really informative, thanks for the good feedback. My question is : Is there a way to force tombstones to be clared with LCS? Does scrub help in any case? Or the only solution would be to create a new CF and migrate all the data if you intend to do a large CF cleanup?
Cheers, On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <mark.re...@boxever.com> wrote: > Thats great Will, if you could update the thread with the actions you > decide to take and the results that would be great. > > > Mark > > > On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <ober...@civicscience.com > > wrote: > >> I've learned a *lot* from this thread. My thanks to all of the >> contributors! >> >> Paulo: Good luck with LCS. I wish I could help there, but all of my CF's >> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...) >> >> will >> >> >> >> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mina.nag...@adgear.com>wrote: >> >>> >>> Levelled Compaction is a wholly different beast when it comes to >>> tombstones. >>> >>> The tombstones are inserted, like any other write really, at the lower >>> levels in the leveldb hierarchy. >>> >>> They are only removed after they have had the chance to "naturally" >>> migrate upwards in the leveldb hierarchy to the highest level in your data >>> store. How long that takes depends on: >>> 1. The amount of data in your store and the number of levels your LCS >>> strategy has >>> 2. The amount of new writes entering the bottom funnel of your leveldb, >>> forcing upwards compaction and combining >>> >>> To give you an idea, I had a similar scenario and ran a (slow, >>> throttled) delete job on my cluster around December-January. Here's a >>> graph of the disk space usage on one node. Notice the still-diclining >>> usage long after the cleanup job has finished (sometime in January). I >>> tend to think of tombstones in LCS as little bombs that get to explode much >>> later in time: >>> >>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg >>> >>> >>> >>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes < >>> paulo.mo...@chaordicsystems.com> wrote: >>> >>> I have a similar problem here, I deleted about 30% of a very large CF >>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if >>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool >>> scrub forces a minor compaction? >>> >>> Cheers, >>> >>> Paulo >>> >>> >>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <mark.re...@boxever.com>wrote: >>> >>>> Yes, running nodetool compact (major compaction) creates one large >>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is >>>> this the compaction strategy you are using?) leading to multiple 'small' >>>> SSTables alongside the single large SSTable, which results in increased >>>> read latency. You will incur the operational overhead of having to manage >>>> compactions if you wish to compact these smaller SSTables. For all these >>>> reasons it is generally advised to stay away from running compactions >>>> manually. >>>> >>>> Assuming that this is a production environment and you want to keep >>>> everything running as smoothly as possible I would reduce the gc_grace on >>>> the CF, allow automatic minor compactions to kick in and then increase the >>>> gc_grace once again after the tombstones have been removed. >>>> >>>> >>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman < >>>> ober...@civicscience.com> wrote: >>>> >>>>> So, if I was impatient and just "wanted to make this happen now", I >>>>> could: >>>>> >>>>> 1.) Change GCGraceSeconds of the CF to 0 >>>>> 2.) run nodetool compact (*) >>>>> 3.) Change GCGraceSeconds of the CF back to 10 days >>>>> >>>>> Since I have ~900M tombstones, even if I miss a few due to impatience, >>>>> I don't care *that* much as I could re-run my clean up tool against the >>>>> now >>>>> much smaller CF. >>>>> >>>>> (*) A long long time ago I seem to recall reading advice about "don't >>>>> ever run nodetool compact", but I can't remember why. Is there any bad >>>>> long term consequence? Short term there are several: >>>>> -a heavy operation >>>>> -temporary 2x disk space >>>>> -one big SSTable afterwards >>>>> But moving forward, everything is ok right? >>>>> CommitLog/MemTable->SStables, minor compactions that merge SSTables, >>>>> etc... The only flaw I can think of is it will take forever until the >>>>> SSTable minor compactions build up enough to consider including the big >>>>> SSTable in a compaction, making it likely I'll have to self manage >>>>> compactions. >>>>> >>>>> >>>>> >>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy >>>>> <mark.re...@boxever.com>wrote: >>>>> >>>>>> Correct, a tombstone will only be removed after gc_grace period has >>>>>> elapsed. The default value is set to 10 days which allows a great deal of >>>>>> time for consistency to be achieved prior to deletion. If you are >>>>>> operationally confident that you can achieve consistency via anti-entropy >>>>>> repairs within a shorter period you can always reduce that 10 day >>>>>> interval. >>>>>> >>>>>> >>>>>> Mark >>>>>> >>>>>> >>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman < >>>>>> ober...@civicscience.com> wrote: >>>>>> >>>>>>> I'm seeing a lot of articles about a dependency between removing >>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just >>>>>>> checked, >>>>>>> and this CF has GCGraceSeconds of 10 days). >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli < >>>>>>> tbarbu...@gmail.com> wrote: >>>>>>> >>>>>>>> compaction should take care of it; for me it never worked so I run >>>>>>>> nodetool compaction on every node; that does it. >>>>>>>> >>>>>>>> >>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman < >>>>>>>> ober...@civicscience.com>: >>>>>>>> >>>>>>>> I'm wondering what will clear tombstoned rows? nodetool cleanup, >>>>>>>>> nodetool repair, or time (as in just wait)? >>>>>>>>> >>>>>>>>> I had a CF that was more or less storing session information. >>>>>>>>> After some time, we decided that one piece of this information was >>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99% of those >>>>>>>>> cases >>>>>>>>> was ALL columns for a row). I wrote a process to remove all of those >>>>>>>>> columns (which again in a vast majority of cases had the effect of >>>>>>>>> removing >>>>>>>>> the whole row). >>>>>>>>> >>>>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m >>>>>>>>> rows. After I did this mass delete, everything was the same size on >>>>>>>>> disk >>>>>>>>> (which I expected, knowing how tombstoning works). It wasn't 100% >>>>>>>>> clear to >>>>>>>>> me what to poke to cause compactions to clear the tombstones. First I >>>>>>>>> tried nodetool cleanup on a candidate node. But, afterwards the disk >>>>>>>>> usage >>>>>>>>> was the same. Then I tried nodetool repair on that same node. But >>>>>>>>> again, >>>>>>>>> disk usage is still the same. The CF has no snapshots. >>>>>>>>> >>>>>>>>> So, am I misunderstanding something? Is there another operation >>>>>>>>> to try? Do I have to "just wait"? I've only done cleanup/repair on >>>>>>>>> one >>>>>>>>> node. Do I have to run one or the other over all nodes to clear >>>>>>>>> tombstones? >>>>>>>>> >>>>>>>>> Cassandra 1.2.15 if it matters, >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> will >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> *Paulo Motta* >>> >>> Chaordic | *Platform* >>> *www.chaordic.com.br <http://www.chaordic.com.br/>* >>> +55 48 3232.3200 >>> >>> >>> >> >> >> > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200