Re: LCS not removing rows with all TTL expired columns

Derek Williams Tue, 22 Jan 2013 18:51:10 -0800

Thanks for letting us know. I also have a some tables with a lot of
activity and very short ttls, and while I haven't experienced this problem,
it's good to know just in case.



On Tue, Jan 22, 2013 at 7:35 PM, Bryan Talbot <btal...@aeriagames.com>wrote:

> It turns out that having gc_grace=0 isn't required to produce the problem.
>  My colleague did a lot of digging into the compaction code and we think
> he's found the issue.  It's detailed in
> https://issues.apache.org/jira/browse/CASSANDRA-5182
>
> Basically tombstones for a row will not be removed from an SSTable during
> compaction if the row appears in other SSTables; however, the compaction
> code checks the bloom filters to make this determination.  Since this data
> is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
> seem to appear in every SSTable as far as compaction is concerned.
>
> This caused our data to essentially never be removed when using either
> STSC or LCS and will probably affect anyone else running 1.1 with high
> bloom filter fp ratios.
>
> Setting our fp ratio to 0.1, running upgradesstables and running the
> application as it was before seems to have stabilized the load as desired
> at the expense of additional jvm memory.
>
> -Bryan
>
>
> On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot <btal...@aeriagames.com>wrote:
>
>> Bleh, I rushed out the email before some meetings and I messed something
>> up.  Working on reproducing now with better notes this time.
>>
>> -Bryan
>>
>>
>>
>> On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote:
>>
>>> When you ran this test, is that the exact schema you used? I'm not
>>> seeing where you are setting gc_grace to 0 (although I could just be blind,
>>> it happens).
>>>
>>>
>>> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <btal...@aeriagames.com>wrote:
>>>
>>>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
>>>> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
>>>> the TTL is small enough so that all LCS data fits in generation 0 then the
>>>> rows seem to be removed with TTL expires as desired.  However, if the
>>>> insertion rate is high enough or the TTL long enough then the data keep
>>>> accumulating for far longer than expected.
>>>>
>>>> Using 120 second TTL and a single threaded php insertion script my MBP
>>>> with SSD retained almost all of the data.  120 seconds should accumulate
>>>> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
>>>> for the cassandra load to level off at some reasonable value near 10 MB.
>>>>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
>>>> the test.
>>>>
>>>> The schema is
>>>>
>>>> create keyspace test
>>>>   with placement_strategy = 'SimpleStrategy'
>>>>   and strategy_options = {replication_factor : 1}
>>>>   and durable_writes = true;
>>>>
>>>> use test;
>>>>
>>>> create column family test
>>>>   with column_type = 'Standard'
>>>>   and comparator = 'UTF8Type'
>>>>   and default_validation_class = 'UTF8Type'
>>>>   and key_validation_class = 'TimeUUIDType'
>>>>   and compaction_strategy =
>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>>>   and caching = 'NONE'
>>>>   and bloom_filter_fp_chance = 1.0
>>>>   and column_metadata = [
>>>>     {column_name : 'a',
>>>>     validation_class : LongType}];
>>>>
>>>>
>>>> and the insert script is
>>>>
>>>> <?php
>>>>
>>>> require_once('phpcassa/1.0.a.5/autoload.php');
>>>>
>>>> use phpcassa\Connection\ConnectionPool;
>>>> use phpcassa\ColumnFamily;
>>>> use phpcassa\SystemManager;
>>>> use phpcassa\UUID;
>>>>
>>>> // Connect to test keyspace and column family
>>>> $sys = new SystemManager('127.0.0.1');
>>>>
>>>> // Start a connection pool, create our ColumnFamily instance
>>>> $pool = new ConnectionPool('test', array('127.0.0.1'));
>>>> $testCf = new ColumnFamily($pool, 'test');
>>>>
>>>> // Insert records
>>>> while( 1 ) {
>>>>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
>>>> }
>>>>
>>>> // Close our connections
>>>> $pool->close();
>>>> $sys->close();
>>>>
>>>> ?>
>>>>
>>>>
>>>> -Bryan
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot 
>>>> <btal...@aeriagames.com>wrote:
>>>>
>>>>> We are using LCS and the particular row I've referenced has been
>>>>> involved in several compactions after all columns have TTL expired.  The
>>>>> most recent one was again this morning and the row is still there -- TTL
>>>>> expired for several days now with gc_grace=0 and several compactions later
>>>>> ...
>>>>>
>>>>>
>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>>>>
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>>
>>>>> $> ls -alF
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>>
>>>>>
>>>>> $> ./bin/sstable2json
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
>>>>> "%x"')
>>>>>  {
>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["success","50f21d3d",1357785277207001,"d"]]
>>>>> }
>>>>>
>>>>>
>>>>> My experience with TTL columns so far has been pretty similar to
>>>>> Viktor's in that the only way to keep them row count under control is to
>>>>> force major compactions.  In real world use, STCS and LCS both leave TTL
>>>>> expired rows around forever as far as I can tell.  When testing with
>>>>> minimal data, removal of TTL expired rows seem to work as expected but in
>>>>> this case there seems to be some divergence from real life work and test
>>>>> samples.
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>>>>> viktor.jevdoki...@adform.com> wrote:
>>>>>
>>>>>>  @Bryan,****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> To keep data size as low as possible with TTL columns we still use
>>>>>> STCS and nightly major compactions.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Experience with LCS was not successful in our case, data size keeps
>>>>>> too high along with amount of compactions.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete
>>>>>> rate. I have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>    Best regards / Pagarbiai
>>>>>> *Viktor Jevdokimov*
>>>>>> Senior Developer
>>>>>>
>>>>>> Email: viktor.jevdoki...@adform.com
>>>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>>>> Take a ride with Adform's Rich Media 
>>>>>> Suite<http://vimeo.com/adform/richmedia>
>>>>>>  [image: Adform News] <http://www.adform.com>
>>>>>> [image: Adform awarded the Best Employer 2012]
>>>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>>>
>>>>>> Disclaimer: The information contained in this message and attachments
>>>>>> is intended solely for the attention and use of the named addressee and 
>>>>>> may
>>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>>> that the information remains the property of the sender. You must not 
>>>>>> use,
>>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>>> received this message in error, please contact the sender immediately and
>>>>>> irrevocably delete this message and any copies.
>>>>>>
>>>>>>   *From:* aaron morton [mailto:aa...@thelastpickle.com]
>>>>>> *Sent:* Thursday, January 17, 2013 06:24
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>>>>> fragments of a row are contained in the SSTables being compacted. So if 
>>>>>> you
>>>>>> have a long lived row, that is present in many size tiers, the columns 
>>>>>> will
>>>>>> not be purged. ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>>   (thus compacted compacted) 3 days after all columns for that row
>>>>>> had expired****
>>>>>>
>>>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds
>>>>>> to 0. If not they do not get a chance to delete previous versions of the
>>>>>> column which already exist on disk. So when the compaction ran your
>>>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. **
>>>>>> **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> I would expect the next round of compaction to remove these columns.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> There is a new feature in 1.2 that may help you here. It will do a
>>>>>> special compaction of individual sstables when they have a certain
>>>>>> proportion of dead columns
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Also interested to know if LCS helps. ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Cheers****
>>>>>>
>>>>>>  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -----------------****
>>>>>>
>>>>>> Aaron Morton****
>>>>>>
>>>>>> Freelance Cassandra Developer****
>>>>>>
>>>>>> New Zealand****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> @aaronmorton****
>>>>>>
>>>>>> http://www.thelastpickle.com****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <btal...@aeriagames.com>
>>>>>> wrote:****
>>>>>>
>>>>>>
>>>>>>
>>>>>> ****
>>>>>>
>>>>>> According to the timestamps (see original post) the SSTable was
>>>>>> written (thus compacted compacted) 3 days after all columns for that row
>>>>>> had expired and 6 days after the row was created; yet all columns are 
>>>>>> still
>>>>>> showing up in the SSTable.  Note that the column shows now rows when a
>>>>>> "get" for that key is run so that's working correctly, but the data is
>>>>>> lugged around far longer than it should be -- maybe forever.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -Bryan****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com>
>>>>>> wrote:****
>>>>>>
>>>>>> To get column removed you have to meet two requirements ****
>>>>>>
>>>>>> 1. column should be expired****
>>>>>>
>>>>>> 2. after that CF gets compacted****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> I guess your expired columns are propagated to high tier CF, which
>>>>>> gets compacted rarely.****
>>>>>>
>>>>>> So, you have to wait when high tier CF gets compacted.  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Andrey****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <
>>>>>> btal...@aeriagames.com> wrote:****
>>>>>>
>>>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>>>>> getting rows to be compacted away (removed) even though all columns have
>>>>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>>>>> same symptom: the data stays around essentially forever.  ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Currently we write all columns with a TTL of 72 hours (259200
>>>>>> seconds) and expect to add 10 GB of data to this CF per day per node.  
>>>>>> Each
>>>>>> node currently has 73 GB for the affected CF and shows no indications 
>>>>>> that
>>>>>> old rows will be removed on their own.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Why aren't rows being removed?  Below is some data from a sample row
>>>>>> which should have been removed several days ago but is still around even
>>>>>> though it has been involved in numerous compactions since being expired.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>>>>
>>>>>>
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ls -alF
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> $> ./bin/sstable2json
>>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
>>>>>> "%x"')
>>>>>> ****
>>>>>>
>>>>>> {****
>>>>>>
>>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>>>>
>>>>>> }****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Decoding the column timestamps to shows that the columns were written
>>>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13
>>>>>> Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was
>>>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out.
>>>>>> ****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> The schema shows that gc_grace is set to 0 since this data is
>>>>>> write-once, read-seldom and is never updated or deleted.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> create column family request_summary****
>>>>>>
>>>>>>   with column_type = 'Standard'****
>>>>>>
>>>>>>   and comparator = 'UTF8Type'****
>>>>>>
>>>>>>   and default_validation_class = 'UTF8Type'****
>>>>>>
>>>>>>   and key_validation_class = 'UTF8Type'****
>>>>>>
>>>>>>   and read_repair_chance = 0.1****
>>>>>>
>>>>>>   and dclocal_read_repair_chance = 0.0****
>>>>>>
>>>>>>   and gc_grace = 0****
>>>>>>
>>>>>>   and min_compaction_threshold = 4****
>>>>>>
>>>>>>   and max_compaction_threshold = 32****
>>>>>>
>>>>>>   and replicate_on_write = true****
>>>>>>
>>>>>>   and compaction_strategy =
>>>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>>>>
>>>>>>   and caching = 'NONE'****
>>>>>>
>>>>>>   and bloom_filter_fp_chance = 1.0****
>>>>>>
>>>>>>   and compression_options = {'chunk_length_kb' : '64',
>>>>>> 'sstable_compression' :
>>>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks in advance for help in understanding why rows such as this are
>>>>>> not removed!****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> -Bryan****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Derek Williams
>>>
>>
>


-- 
Derek Williams

<<signature-logo29.png>>

<<signature-best-employer-logo4823.png>>

Re: LCS not removing rows with all TTL expired columns

Reply via email to