Re: LCS not removing rows with all TTL expired columns

Bryan Talbot Tue, 22 Jan 2013 18:36:30 -0800

It turns out that having gc_grace=0 isn't required to produce the problem.
 My colleague did a lot of digging into the compaction code and we think
he's found the issue.  It's detailed in
https://issues.apache.org/jira/browse/CASSANDRA-5182


Basically tombstones for a row will not be removed from an SSTable during
compaction if the row appears in other SSTables; however, the compaction
code checks the bloom filters to make this determination.  Since this data
is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
seem to appear in every SSTable as far as compaction is concerned.

This caused our data to essentially never be removed when using either STSC
or LCS and will probably affect anyone else running 1.1 with high bloom
filter fp ratios.

Setting our fp ratio to 0.1, running upgradesstables and running the
application as it was before seems to have stabilized the load as desired
at the expense of additional jvm memory.

-Bryan


On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot <btal...@aeriagames.com>wrote:

> Bleh, I rushed out the email before some meetings and I messed something
> up.  Working on reproducing now with better notes this time.
>
> -Bryan
>
>
>
> On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote:
>
>> When you ran this test, is that the exact schema you used? I'm not seeing
>> where you are setting gc_grace to 0 (although I could just be blind, it
>> happens).
>>
>>
>> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <btal...@aeriagames.com>wrote:
>>
>>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
>>> 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
>>> the TTL is small enough so that all LCS data fits in generation 0 then the
>>> rows seem to be removed with TTL expires as desired.  However, if the
>>> insertion rate is high enough or the TTL long enough then the data keep
>>> accumulating for far longer than expected.
>>>
>>> Using 120 second TTL and a single threaded php insertion script my MBP
>>> with SSD retained almost all of the data.  120 seconds should accumulate
>>> 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
>>> for the cassandra load to level off at some reasonable value near 10 MB.
>>>  After running for 2 hours and with a cassandra load of ~550 MB I stopped
>>> the test.
>>>
>>> The schema is
>>>
>>> create keyspace test
>>>   with placement_strategy = 'SimpleStrategy'
>>>   and strategy_options = {replication_factor : 1}
>>>   and durable_writes = true;
>>>
>>> use test;
>>>
>>> create column family test
>>>   with column_type = 'Standard'
>>>   and comparator = 'UTF8Type'
>>>   and default_validation_class = 'UTF8Type'
>>>   and key_validation_class = 'TimeUUIDType'
>>>   and compaction_strategy =
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>>   and caching = 'NONE'
>>>   and bloom_filter_fp_chance = 1.0
>>>   and column_metadata = [
>>>     {column_name : 'a',
>>>     validation_class : LongType}];
>>>
>>>
>>> and the insert script is
>>>
>>> <?php
>>>
>>> require_once('phpcassa/1.0.a.5/autoload.php');
>>>
>>> use phpcassa\Connection\ConnectionPool;
>>> use phpcassa\ColumnFamily;
>>> use phpcassa\SystemManager;
>>> use phpcassa\UUID;
>>>
>>> // Connect to test keyspace and column family
>>> $sys = new SystemManager('127.0.0.1');
>>>
>>> // Start a connection pool, create our ColumnFamily instance
>>> $pool = new ConnectionPool('test', array('127.0.0.1'));
>>> $testCf = new ColumnFamily($pool, 'test');
>>>
>>> // Insert records
>>> while( 1 ) {
>>>   $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120);
>>> }
>>>
>>> // Close our connections
>>> $pool->close();
>>> $sys->close();
>>>
>>> ?>
>>>
>>>
>>> -Bryan
>>>
>>>
>>>
>>>
>>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot 
>>> <btal...@aeriagames.com>wrote:
>>>
>>>> We are using LCS and the particular row I've referenced has been
>>>> involved in several compactions after all columns have TTL expired.  The
>>>> most recent one was again this morning and the row is still there -- TTL
>>>> expired for several days now with gc_grace=0 and several compactions later
>>>> ...
>>>>
>>>>
>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>>>
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>
>>>> $> ls -alF
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>>
>>>>
>>>> $> ./bin/sstable2json
>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
>>>> "%x"')
>>>>  {
>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>> ["success","50f21d3d",1357785277207001,"d"]]
>>>> }
>>>>
>>>>
>>>> My experience with TTL columns so far has been pretty similar to
>>>> Viktor's in that the only way to keep them row count under control is to
>>>> force major compactions.  In real world use, STCS and LCS both leave TTL
>>>> expired rows around forever as far as I can tell.  When testing with
>>>> minimal data, removal of TTL expired rows seem to work as expected but in
>>>> this case there seems to be some divergence from real life work and test
>>>> samples.
>>>>
>>>> -Bryan
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov <
>>>> viktor.jevdoki...@adform.com> wrote:
>>>>
>>>>>  @Bryan,****
>>>>>
>>>>> ** **
>>>>>
>>>>> To keep data size as low as possible with TTL columns we still use
>>>>> STCS and nightly major compactions.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Experience with LCS was not successful in our case, data size keeps
>>>>> too high along with amount of compactions.****
>>>>>
>>>>> ** **
>>>>>
>>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.
>>>>> I have not tested 1.2 LCS behavior, we’re still on 1.0.x****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>    Best regards / Pagarbiai
>>>>> *Viktor Jevdokimov*
>>>>> Senior Developer
>>>>>
>>>>> Email: viktor.jevdoki...@adform.com
>>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>>> Take a ride with Adform's Rich Media 
>>>>> Suite<http://vimeo.com/adform/richmedia>
>>>>>  [image: Adform News] <http://www.adform.com>
>>>>> [image: Adform awarded the Best Employer 2012]
>>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>>
>>>>> Disclaimer: The information contained in this message and attachments
>>>>> is intended solely for the attention and use of the named addressee and 
>>>>> may
>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>> that the information remains the property of the sender. You must not use,
>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>> received this message in error, please contact the sender immediately and
>>>>> irrevocably delete this message and any copies.
>>>>>
>>>>>   *From:* aaron morton [mailto:aa...@thelastpickle.com]
>>>>> *Sent:* Thursday, January 17, 2013 06:24
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns****
>>>>>
>>>>> ** **
>>>>>
>>>>> Minor compaction (with Size Tiered) will only purge tombstones if all
>>>>> fragments of a row are contained in the SSTables being compacted. So if 
>>>>> you
>>>>> have a long lived row, that is present in many size tiers, the columns 
>>>>> will
>>>>> not be purged. ****
>>>>>
>>>>> ** **
>>>>>
>>>>>   (thus compacted compacted) 3 days after all columns for that row
>>>>> had expired****
>>>>>
>>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds
>>>>> to 0. If not they do not get a chance to delete previous versions of the
>>>>> column which already exist on disk. So when the compaction ran your
>>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ***
>>>>> *
>>>>>
>>>>> ** **
>>>>>
>>>>> I would expect the next round of compaction to remove these columns. *
>>>>> ***
>>>>>
>>>>> ** **
>>>>>
>>>>> There is a new feature in 1.2 that may help you here. It will do a
>>>>> special compaction of individual sstables when they have a certain
>>>>> proportion of dead columns
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Also interested to know if LCS helps. ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Cheers****
>>>>>
>>>>>  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> -----------------****
>>>>>
>>>>> Aaron Morton****
>>>>>
>>>>> Freelance Cassandra Developer****
>>>>>
>>>>> New Zealand****
>>>>>
>>>>> ** **
>>>>>
>>>>> @aaronmorton****
>>>>>
>>>>> http://www.thelastpickle.com****
>>>>>
>>>>> ** **
>>>>>
>>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <btal...@aeriagames.com>
>>>>> wrote:****
>>>>>
>>>>>
>>>>>
>>>>> ****
>>>>>
>>>>> According to the timestamps (see original post) the SSTable was
>>>>> written (thus compacted compacted) 3 days after all columns for that row
>>>>> had expired and 6 days after the row was created; yet all columns are 
>>>>> still
>>>>> showing up in the SSTable.  Note that the column shows now rows when a
>>>>> "get" for that key is run so that's working correctly, but the data is
>>>>> lugged around far longer than it should be -- maybe forever.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> -Bryan****
>>>>>
>>>>> ** **
>>>>>
>>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com>
>>>>> wrote:****
>>>>>
>>>>> To get column removed you have to meet two requirements ****
>>>>>
>>>>> 1. column should be expired****
>>>>>
>>>>> 2. after that CF gets compacted****
>>>>>
>>>>> ** **
>>>>>
>>>>> I guess your expired columns are propagated to high tier CF, which
>>>>> gets compacted rarely.****
>>>>>
>>>>> So, you have to wait when high tier CF gets compacted.  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Andrey****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com>
>>>>> wrote:****
>>>>>
>>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>>>>> getting rows to be compacted away (removed) even though all columns have
>>>>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>>>>> same symptom: the data stays around essentially forever.  ****
>>>>>
>>>>> ** **
>>>>>
>>>>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>>>>> and expect to add 10 GB of data to this CF per day per node.  Each node
>>>>> currently has 73 GB for the affected CF and shows no indications that old
>>>>> rows will be removed on their own.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Why aren't rows being removed?  Below is some data from a sample row
>>>>> which should have been removed several days ago but is still around even
>>>>> though it has been involved in numerous compactions since being expired.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4****
>>>>>
>>>>>
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ls -alF
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> $> ./bin/sstable2json
>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
>>>>> "%x"')
>>>>> ****
>>>>>
>>>>> {****
>>>>>
>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>>>>> [["app_name","50f21d3d",1357785277207001,"d"],
>>>>> ["client_ip","50f21d3d",1357785277207001,"d"],
>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["req_method","50f21d3d",1357785277207001,"d"],
>>>>> ["req_service","50f21d3d",1357785277207001,"d"],
>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>>>>> ["success","50f21d3d",1357785277207001,"d"]]****
>>>>>
>>>>> }****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Decoding the column timestamps to shows that the columns were written
>>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13
>>>>> Jan 2013 02:34:37 GMT".  The date of the SSTable shows that it was
>>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> The schema shows that gc_grace is set to 0 since this data is
>>>>> write-once, read-seldom and is never updated or deleted.****
>>>>>
>>>>> ** **
>>>>>
>>>>> create column family request_summary****
>>>>>
>>>>>   with column_type = 'Standard'****
>>>>>
>>>>>   and comparator = 'UTF8Type'****
>>>>>
>>>>>   and default_validation_class = 'UTF8Type'****
>>>>>
>>>>>   and key_validation_class = 'UTF8Type'****
>>>>>
>>>>>   and read_repair_chance = 0.1****
>>>>>
>>>>>   and dclocal_read_repair_chance = 0.0****
>>>>>
>>>>>   and gc_grace = 0****
>>>>>
>>>>>   and min_compaction_threshold = 4****
>>>>>
>>>>>   and max_compaction_threshold = 32****
>>>>>
>>>>>   and replicate_on_write = true****
>>>>>
>>>>>   and compaction_strategy =
>>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'****
>>>>>
>>>>>   and caching = 'NONE'****
>>>>>
>>>>>   and bloom_filter_fp_chance = 1.0****
>>>>>
>>>>>   and compression_options = {'chunk_length_kb' : '64',
>>>>> 'sstable_compression' :
>>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks in advance for help in understanding why rows such as this are
>>>>> not removed!****
>>>>>
>>>>> ** **
>>>>>
>>>>> -Bryan****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Derek Williams
>>
>

<<signature-best-employer-logo4823.png>>

<<signature-logo29.png>>

Re: LCS not removing rows with all TTL expired columns

Reply via email to