When you ran this test, is that the exact schema you used? I'm not seeing
where you are setting gc_grace to 0 (although I could just be blind, it

On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <btal...@aeriagames.com>wrote:

> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
> the TTL is small enough so that all LCS data fits in generation 0 then the
> rows seem to be removed with TTL expires as desired.  However, if the
> insertion rate is high enough or the TTL long enough then the data keep
> accumulating for far longer than expected.
> Using 120 second TTL and a single threaded php insertion script my MBP
> with SSD retained almost all of the data.  120 seconds should accumulate
> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
> for the cassandra load to level off at some reasonable value near 10 MB.
>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
> the test.
> The schema is
> create keyspace test
>   with placement_strategy = 'SimpleStrategy'
>   and strategy_options = {replication_factor : 1}
>   and durable_writes = true;
> use test;
> create column family test
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'TimeUUIDType'
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and bloom_filter_fp_chance = 1.0
>   and column_metadata = [
>     {column_name : 'a',
>     validation_class : LongType}];
> and the insert script is
> <?php
> require_once('phpcassa/1.0.a.5/autoload.php');
> use phpcassa\Connection\ConnectionPool;
> use phpcassa\ColumnFamily;
> use phpcassa\SystemManager;
> use phpcassa\UUID;
> // Connect to test keyspace and column family
> $sys = new SystemManager('');
> // Start a connection pool, create our ColumnFamily instance
> $pool = new ConnectionPool('test', array(''));
> $testCf = new ColumnFamily($pool, 'test');
> // Insert records
> while( 1 ) {
>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
> }
> // Close our connections
> $pool->close();
> $sys->close();
> ?>
> -Bryan
> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot <btal...@aeriagames.com>wrote:
>> We are using LCS and the particular row I've referenced has been involved
>> in several compactions after all columns have TTL expired.  The most recent
>> one was again this morning and the row is still there -- TTL expired for
>> several days now with gc_grace=0 and several compactions later ...
>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> $> ls -alF
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> $> ./bin/sstable2json
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>  {
>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>> [["app_name","50f21d3d",1357785277207001,"d"],
>> ["client_ip","50f21d3d",1357785277207001,"d"],
>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>> ["req_method","50f21d3d",1357785277207001,"d"],
>> ["req_service","50f21d3d",1357785277207001,"d"],
>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>> ["success","50f21d3d",1357785277207001,"d"]]
>> }
>> My experience with TTL columns so far has been pretty similar to Viktor's
>> in that the only way to keep them row count under control is to force major
>> compactions.  In real world use, STCS and LCS both leave TTL expired rows
>> around forever as far as I can tell.  When testing with minimal data,
>> removal of TTL expired rows seem to work as expected but in this case there
>> seems to be some divergence from real life work and test samples.
>> -Bryan
>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>> viktor.jevdoki...@adform.com> wrote:
>>>  @Bryan,****
>>> ** **
>>> To keep data size as low as possible with TTL columns we still use STCS
>>> and nightly major compactions.****
>>> ** **
>>> Experience with LCS was not successful in our case, data size keeps too
>>> high along with amount of compactions.****
>>> ** **
>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I
>>> have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>> ** **
>>> ** **
>>>    Best regards / Pagarbiai
>>> *Viktor Jevdokimov*
>>> Senior Developer
>>> Email: viktor.jevdoki...@adform.com
>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>> Take a ride with Adform's Rich Media 
>>> Suite<http://vimeo.com/adform/richmedia>
>>>  [image: Adform News] <http://www.adform.com>
>>> [image: Adform awarded the Best Employer 2012]
>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>   *From:* aaron morton [mailto:aa...@thelastpickle.com]
>>> *Sent:* Thursday, January 17, 2013 06:24
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>> ** **
>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>> fragments of a row are contained in the SSTables being compacted. So if you
>>> have a long lived row, that is present in many size tiers, the columns will
>>> not be purged. ****
>>> ** **
>>>   (thus compacted compacted) 3 days after all columns for that row had
>>> expired****
>>> Tombstones have to get on disk, even if you set the gc_grace_seconds to
>>> 0. If not they do not get a chance to delete previous versions of the
>>> column which already exist on disk. So when the compaction ran your
>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ****
>>> ** **
>>> I would expect the next round of compaction to remove these columns. ***
>>> *
>>> ** **
>>> There is a new feature in 1.2 that may help you here. It will do a
>>> special compaction of individual sstables when they have a certain
>>> proportion of dead columns
>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>> ** **
>>> Also interested to know if LCS helps. ****
>>> ** **
>>> Cheers****
>>>  ****
>>> ** **
>>> -----------------****
>>> Aaron Morton****
>>> Freelance Cassandra Developer****
>>> New Zealand****
>>> ** **
>>> @aaronmorton****
>>> http://www.thelastpickle.com****
>>> ** **
>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <btal...@aeriagames.com> wrote:*
>>> ***
>>> ****
>>> According to the timestamps (see original post) the SSTable was written
>>> (thus compacted compacted) 3 days after all columns for that row had
>>> expired and 6 days after the row was created; yet all columns are still
>>> showing up in the SSTable.  Note that the column shows now rows when a
>>> "get" for that key is run so that's working correctly, but the data is
>>> lugged around far longer than it should be -- maybe forever.****
>>> ** **
>>> ** **
>>> -Bryan****
>>> ** **
>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com>
>>> wrote:****
>>> To get column removed you have to meet two requirements ****
>>> 1. column should be expired****
>>> 2. after that CF gets compacted****
>>> ** **
>>> I guess your expired columns are propagated to high tier CF, which gets
>>> compacted rarely.****
>>> So, you have to wait when high tier CF gets compacted.  ****
>>> ** **
>>> Andrey****
>>> ** **
>>> ** **
>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com>
>>> wrote:****
>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>> getting rows to be compacted away (removed) even though all columns have
>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>> same symptom: the data stays around essentially forever.  ****
>>> ** **
>>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>>> and expect to add 10 GB of data to this CF per day per node.  Each node
>>> currently has 73 GB for the affected CF and shows no indications that old
>>> rows will be removed on their own.****
>>> ** **
>>> Why aren't rows being removed?  Below is some data from a sample row
>>> which should have been removed several days ago but is still around even
>>> though it has been involved in numerous compactions since being expired.
>>> ****
>>> ** **
>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>> ** **
>>> $> ls -alF
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> ****
>>> ** **
>>> $> ./bin/sstable2json
>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>>> ****
>>> {****
>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>> }****
>>> ** **
>>> ** **
>>> Decoding the column timestamps to shows that the columns were written at
>>> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
>>> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
>>> Jan 16 which is 3 days after all columns have TTL-ed out.****
>>> ** **
>>> ** **
>>> The schema shows that gc_grace is set to 0 since this data is
>>> write-once, read-seldom and is never updated or deleted.****
>>> ** **
>>> create column family request_summary****
>>>   with column_type = 'Standard'****
>>>   and comparator = 'UTF8Type'****
>>>   and default_validation_class = 'UTF8Type'****
>>>   and key_validation_class = 'UTF8Type'****
>>>   and read_repair_chance = 0.1****
>>>   and dclocal_read_repair_chance = 0.0****
>>>   and gc_grace = 0****
>>>   and min_compaction_threshold = 4****
>>>   and max_compaction_threshold = 32****
>>>   and replicate_on_write = true****
>>>   and compaction_strategy =
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>   and caching = 'NONE'****
>>>   and bloom_filter_fp_chance = 1.0****
>>>   and compression_options = {'chunk_length_kb' : '64',
>>> 'sstable_compression' :
>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>> ** **
>>> ** **
>>> Thanks in advance for help in understanding why rows such as this are
>>> not removed!****
>>> ** **
>>> -Bryan****
>>> ** **
>>> ** **
>>> ** **
>>> ** **

Derek Williams



Reply via email to