Minor compaction (with Size Tiered) will only purge tombstones if all fragments of a row are contained in the SSTables being compacted. So if you have a long lived row, that is present in many size tiers, the columns will not be purged.
> (thus compacted compacted) 3 days after all columns for that row had expired Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If not they do not get a chance to delete previous versions of the column which already exist on disk. So when the compaction ran your ExpiringColumn was turned into a DeletedColumn and placed on disk. I would expect the next round of compaction to remove these columns. There is a new feature in 1.2 that may help you here. It will do a special compaction of individual sstables when they have a certain proportion of dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442 Also interested to know if LCS helps. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 2:55 PM, Bryan Talbot <btal...@aeriagames.com> wrote: > According to the timestamps (see original post) the SSTable was written (thus > compacted compacted) 3 days after all columns for that row had expired and 6 > days after the row was created; yet all columns are still showing up in the > SSTable. Note that the column shows now rows when a "get" for that key is > run so that's working correctly, but the data is lugged around far longer > than it should be -- maybe forever. > > > -Bryan > > > On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com> wrote: > To get column removed you have to meet two requirements > 1. column should be expired > 2. after that CF gets compacted > > I guess your expired columns are propagated to high tier CF, which gets > compacted rarely. > So, you have to wait when high tier CF gets compacted. > > Andrey > > > > On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com> wrote: > On cassandra 1.1.5 with a write heavy workload, we're having problems getting > rows to be compacted away (removed) even though all columns have expired TTL. > We've tried size tiered and now leveled and are seeing the same symptom: the > data stays around essentially forever. > > Currently we write all columns with a TTL of 72 hours (259200 seconds) and > expect to add 10 GB of data to this CF per day per node. Each node currently > has 73 GB for the affected CF and shows no indications that old rows will be > removed on their own. > > Why aren't rows being removed? Below is some data from a sample row which > should have been removed several days ago but is still around even though it > has been involved in numerous compactions since being expired. > > $> ./bin/nodetool -h localhost getsstables metrics request_summary > 459fb460-5ace-11e2-9b92-11d67b6163b4 > /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db > > $> ls -alF > /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db > -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 > /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db > > $> ./bin/sstable2json > /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db > -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 "%x"') > { > "34353966623436302d356163652d313165322d396239322d313164363762363136336234": > [["app_name","50f21d3d",1357785277207001,"d"], > ["client_ip","50f21d3d",1357785277207001,"d"], > ["client_req_id","50f21d3d",1357785277207001,"d"], > ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], > ["mysql_duration_us","50f21d3d",1357785277207001,"d"], > ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], > ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], > ["req_duration_us","50f21d3d",1357785277207001,"d"], > ["req_finish_time_us","50f21d3d",1357785277207001,"d"], > ["req_method","50f21d3d",1357785277207001,"d"], > ["req_service","50f21d3d",1357785277207001,"d"], > ["req_start_time_us","50f21d3d",1357785277207001,"d"], > ["success","50f21d3d",1357785277207001,"d"]] > } > > > Decoding the column timestamps to shows that the columns were written at > "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan > 2013 02:34:37 GMT". The date of the SSTable shows that it was generated on > Jan 16 which is 3 days after all columns have TTL-ed out. > > > The schema shows that gc_grace is set to 0 since this data is write-once, > read-seldom and is never updated or deleted. > > create column family request_summary > with column_type = 'Standard' > and comparator = 'UTF8Type' > and default_validation_class = 'UTF8Type' > and key_validation_class = 'UTF8Type' > and read_repair_chance = 0.1 > and dclocal_read_repair_chance = 0.0 > and gc_grace = 0 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and compaction_strategy = > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' > and caching = 'NONE' > and bloom_filter_fp_chance = 1.0 > and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' > : 'org.apache.cassandra.io.compress.SnappyCompressor'}; > > > Thanks in advance for help in understanding why rows such as this are not > removed! > > -Bryan > > >