To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey



On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com>wrote:

> On cassandra 1.1.5 with a write heavy workload, we're having problems
> getting rows to be compacted away (removed) even though all columns have
> expired TTL.  We've tried size tiered and now leveled and are seeing the
> same symptom: the data stays around essentially forever.
>
> Currently we write all columns with a TTL of 72 hours (259200 seconds) and
> expect to add 10 GB of data to this CF per day per node.  Each node
> currently has 73 GB for the affected CF and shows no indications that old
> rows will be removed on their own.
>
> Why aren't rows being removed?  Below is some data from a sample row which
> should have been removed several days ago but is still around even though
> it has been involved in numerous compactions since being expired.
>
> $> ./bin/nodetool -h localhost getsstables metrics request_summary
> 459fb460-5ace-11e2-9b92-11d67b6163b4
>
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>
> $> ls -alF
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>
> $> ./bin/sstable2json
> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
> {
> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
> [["app_name","50f21d3d",1357785277207001,"d"],
> ["client_ip","50f21d3d",1357785277207001,"d"],
> ["client_req_id","50f21d3d",1357785277207001,"d"],
> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
> ["req_duration_us","50f21d3d",1357785277207001,"d"],
> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
> ["req_method","50f21d3d",1357785277207001,"d"],
> ["req_service","50f21d3d",1357785277207001,"d"],
> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
> ["success","50f21d3d",1357785277207001,"d"]]
> }
>
>
> Decoding the column timestamps to shows that the columns were written at
> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
> Jan 16 which is 3 days after all columns have TTL-ed out.
>
>
> The schema shows that gc_grace is set to 0 since this data is write-once,
> read-seldom and is never updated or deleted.
>
> create column family request_summary
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'UTF8Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 0
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and bloom_filter_fp_chance = 1.0
>   and compression_options = {'chunk_length_kb' : '64',
> 'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>
>
> Thanks in advance for help in understanding why rows such as this are not
> removed!
>
> -Bryan
>
>

Reply via email to