I've learned a *lot* from this thread. My thanks to all of the contributors!
Paulo: Good luck with LCS. I wish I could help there, but all of my CF's are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...) will On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mina.nag...@adgear.com>wrote: > > Levelled Compaction is a wholly different beast when it comes to > tombstones. > > The tombstones are inserted, like any other write really, at the lower > levels in the leveldb hierarchy. > > They are only removed after they have had the chance to "naturally" > migrate upwards in the leveldb hierarchy to the highest level in your data > store. How long that takes depends on: > 1. The amount of data in your store and the number of levels your LCS > strategy has > 2. The amount of new writes entering the bottom funnel of your leveldb, > forcing upwards compaction and combining > > To give you an idea, I had a similar scenario and ran a (slow, throttled) > delete job on my cluster around December-January. Here's a graph of the > disk space usage on one node. Notice the still-diclining usage long after > the cleanup job has finished (sometime in January). I tend to think of > tombstones in LCS as little bombs that get to explode much later in time: > > http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg > > > > On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes < > paulo.mo...@chaordicsystems.com> wrote: > > I have a similar problem here, I deleted about 30% of a very large CF > using LCS (about 80GB per node), but still my data hasn't shrinked, even if > I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool > scrub forces a minor compaction? > > Cheers, > > Paulo > > > On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <mark.re...@boxever.com>wrote: > >> Yes, running nodetool compact (major compaction) creates one large >> SSTable. This will mess up the heuristics of the SizeTiered strategy (is >> this the compaction strategy you are using?) leading to multiple 'small' >> SSTables alongside the single large SSTable, which results in increased >> read latency. You will incur the operational overhead of having to manage >> compactions if you wish to compact these smaller SSTables. For all these >> reasons it is generally advised to stay away from running compactions >> manually. >> >> Assuming that this is a production environment and you want to keep >> everything running as smoothly as possible I would reduce the gc_grace on >> the CF, allow automatic minor compactions to kick in and then increase the >> gc_grace once again after the tombstones have been removed. >> >> >> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman < >> ober...@civicscience.com> wrote: >> >>> So, if I was impatient and just "wanted to make this happen now", I >>> could: >>> >>> 1.) Change GCGraceSeconds of the CF to 0 >>> 2.) run nodetool compact (*) >>> 3.) Change GCGraceSeconds of the CF back to 10 days >>> >>> Since I have ~900M tombstones, even if I miss a few due to impatience, I >>> don't care *that* much as I could re-run my clean up tool against the now >>> much smaller CF. >>> >>> (*) A long long time ago I seem to recall reading advice about "don't >>> ever run nodetool compact", but I can't remember why. Is there any bad >>> long term consequence? Short term there are several: >>> -a heavy operation >>> -temporary 2x disk space >>> -one big SSTable afterwards >>> But moving forward, everything is ok right? >>> CommitLog/MemTable->SStables, minor compactions that merge SSTables, >>> etc... The only flaw I can think of is it will take forever until the >>> SSTable minor compactions build up enough to consider including the big >>> SSTable in a compaction, making it likely I'll have to self manage >>> compactions. >>> >>> >>> >>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.re...@boxever.com>wrote: >>> >>>> Correct, a tombstone will only be removed after gc_grace period has >>>> elapsed. The default value is set to 10 days which allows a great deal of >>>> time for consistency to be achieved prior to deletion. If you are >>>> operationally confident that you can achieve consistency via anti-entropy >>>> repairs within a shorter period you can always reduce that 10 day interval. >>>> >>>> >>>> Mark >>>> >>>> >>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman < >>>> ober...@civicscience.com> wrote: >>>> >>>>> I'm seeing a lot of articles about a dependency between removing >>>>> tombstones and GCGraceSeconds, which might be my problem (I just checked, >>>>> and this CF has GCGraceSeconds of 10 days). >>>>> >>>>> >>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli < >>>>> tbarbu...@gmail.com> wrote: >>>>> >>>>>> compaction should take care of it; for me it never worked so I run >>>>>> nodetool compaction on every node; that does it. >>>>>> >>>>>> >>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <ober...@civicscience.com> >>>>>> : >>>>>> >>>>>> I'm wondering what will clear tombstoned rows? nodetool cleanup, >>>>>>> nodetool repair, or time (as in just wait)? >>>>>>> >>>>>>> I had a CF that was more or less storing session information. After >>>>>>> some time, we decided that one piece of this information was pointless >>>>>>> to >>>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL >>>>>>> columns for a row). I wrote a process to remove all of those columns >>>>>>> (which again in a vast majority of cases had the effect of removing the >>>>>>> whole row). >>>>>>> >>>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows. >>>>>>> After I did this mass delete, everything was the same size on disk >>>>>>> (which >>>>>>> I expected, knowing how tombstoning works). It wasn't 100% clear to me >>>>>>> what to poke to cause compactions to clear the tombstones. First I >>>>>>> tried >>>>>>> nodetool cleanup on a candidate node. But, afterwards the disk usage >>>>>>> was >>>>>>> the same. Then I tried nodetool repair on that same node. But again, >>>>>>> disk >>>>>>> usage is still the same. The CF has no snapshots. >>>>>>> >>>>>>> So, am I misunderstanding something? Is there another operation to >>>>>>> try? Do I have to "just wait"? I've only done cleanup/repair on one >>>>>>> node. >>>>>>> Do I have to run one or the other over all nodes to clear tombstones? >>>>>>> >>>>>>> Cassandra 1.2.15 if it matters, >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> will >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >> > > > -- > *Paulo Motta* > > Chaordic | *Platform* > *www.chaordic.com.br <http://www.chaordic.com.br/>* > +55 48 3232.3200 > > >