> periodically trimming the row by by deleting the oldest columns, the deleted 
> columns won't get cleaned up until all fragments of the row exist in a single 
> sstable and that sstable undergoes a compaction?
Nope. 
They are purged when all of the fragments of the row exist in the same SSTabels 
(plural) being compacted. 

Say you create a row and write to it for a while, it may be spread into 2 or 3 
new stables. When there are 4 they are compacted into one, which will be bigger 
than the original 4. When there are 4 at the next size bucket they are 
compacted and so on. 

If you row exists in one size bucket only it GC will be purged. 

If you have a row you have been writing to for a long time it may be spread out 
in many buckets. That's not normally a big problem, but if you also do lots of 
deletes the tombstones will not get purged. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/12/2012, at 4:45 PM, Mike Smith <m...@mailchannels.com> wrote:

> Thanks for the great explanation.
> 
> I'd just like some clarification on the last point. Is it the case that if I 
> constantly add new columns to a row, while periodically trimming the row by 
> by deleting the oldest columns, the deleted columns won't get cleaned up 
> until all fragments of the row exist in a single sstable and that sstable 
> undergoes a compaction?
> 
> If my understanding is correct, do you know if 1.2 will enable cleanup of 
> columns in rows that have scattered fragments? Or, should I take a different 
> approach?
> 
> 
> 
> On Thu, Dec 13, 2012 at 5:52 PM, aaron morton <aa...@thelastpickle.com> wrote:
>>  Is it possible to use scrub to accelerate the clean up of expired/deleted 
>> data?
> No.
> Scrub, and upgradesstables, are used to re-write each file on disk. Scrub may 
> remove some rows from a file because of corruption, however upgradesstables 
> will not. 
> 
> If you have long lived rows and a mixed work load of writes and deletes there 
> are a couple of options. 
> 
> You can try levelled compaction 
> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
> 
> You can tune the default sized tiered compaction by increasing the 
> min_compaction_threshold. This will increase the number of files that must 
> exist in each size tier before it will be compacted. As a result the speed at 
> which rows move into the higher tiers will slow down. 
> 
> Note that having lots of files may have a negative impact on read 
> performance. You can measure this my looking at the SSTables per read metric 
> in the cfhistograms. 
> 
> Lastly you can run a user defined or major compaction. User defined 
> compaction is available via JMX and allows you to compact any file you want. 
> Manual / major compaction is available via node tool. We usually discourage 
> it's use as it will create one big file that will not get compacted for a 
> while. 
> 
> 
> For background the tombstones / expired columns for a row are only purged 
> from the database when all fragments of the row are  in the files been 
> compacted. So if you have an old row that is spread out over many files it 
> may not get purged. 
> 
> Hope that helps. 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 14/12/2012, at 3:01 AM, Mike Smith <m...@mailchannels.com> wrote:
> 
>> I'm using 1.0.12 and I find that large sstables tend to get compacted 
>> infrequently. I've got data that gets deleted or expired frequently. Is it 
>> possible to use scrub to accelerate the clean up of expired/deleted data?
>> 
>> -- 
>> Mike Smith
>> Director Development, MailChannels
>> 
> 
> 
> 
> 
> -- 
> Mike Smith
> Director Development, MailChannels
> 

Reply via email to