So, if I was impatient and just "wanted to make this happen now", I could:
1.) Change GCGraceSeconds of the CF to 0 2.) run nodetool compact (*) 3.) Change GCGraceSeconds of the CF back to 10 days Since I have ~900M tombstones, even if I miss a few due to impatience, I don't care *that* much as I could re-run my clean up tool against the now much smaller CF. (*) A long long time ago I seem to recall reading advice about "don't ever run nodetool compact", but I can't remember why. Is there any bad long term consequence? Short term there are several: -a heavy operation -temporary 2x disk space -one big SSTable afterwards But moving forward, everything is ok right? CommitLog/MemTable->SStables, minor compactions that merge SSTables, etc... The only flaw I can think of is it will take forever until the SSTable minor compactions build up enough to consider including the big SSTable in a compaction, making it likely I'll have to self manage compactions. On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.re...@boxever.com> wrote: > Correct, a tombstone will only be removed after gc_grace period has > elapsed. The default value is set to 10 days which allows a great deal of > time for consistency to be achieved prior to deletion. If you are > operationally confident that you can achieve consistency via anti-entropy > repairs within a shorter period you can always reduce that 10 day interval. > > > Mark > > > On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <ober...@civicscience.com > > wrote: > >> I'm seeing a lot of articles about a dependency between removing >> tombstones and GCGraceSeconds, which might be my problem (I just checked, >> and this CF has GCGraceSeconds of 10 days). >> >> >> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli >> <tbarbu...@gmail.com>wrote: >> >>> compaction should take care of it; for me it never worked so I run >>> nodetool compaction on every node; that does it. >>> >>> >>> 2014-04-11 16:05 GMT+02:00 William Oberman <ober...@civicscience.com>: >>> >>> I'm wondering what will clear tombstoned rows? nodetool cleanup, >>>> nodetool repair, or time (as in just wait)? >>>> >>>> I had a CF that was more or less storing session information. After >>>> some time, we decided that one piece of this information was pointless to >>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL >>>> columns for a row). I wrote a process to remove all of those columns >>>> (which again in a vast majority of cases had the effect of removing the >>>> whole row). >>>> >>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows. >>>> After I did this mass delete, everything was the same size on disk (which >>>> I expected, knowing how tombstoning works). It wasn't 100% clear to me >>>> what to poke to cause compactions to clear the tombstones. First I tried >>>> nodetool cleanup on a candidate node. But, afterwards the disk usage was >>>> the same. Then I tried nodetool repair on that same node. But again, disk >>>> usage is still the same. The CF has no snapshots. >>>> >>>> So, am I misunderstanding something? Is there another operation to >>>> try? Do I have to "just wait"? I've only done cleanup/repair on one node. >>>> Do I have to run one or the other over all nodes to clear tombstones? >>>> >>>> Cassandra 1.2.15 if it matters, >>>> >>>> Thanks! >>>> >>>> will >>>> >>> >>> >> >> >> >