Thanks for letting us know. I also have a some tables with a lot of activity and very short ttls, and while I haven't experienced this problem, it's good to know just in case.
On Tue, Jan 22, 2013 at 7:35 PM, Bryan Talbot <btal...@aeriagames.com>wrote: > It turns out that having gc_grace=0 isn't required to produce the problem. > My colleague did a lot of digging into the compaction code and we think > he's found the issue. It's detailed in > https://issues.apache.org/jira/browse/CASSANDRA-5182 > > Basically tombstones for a row will not be removed from an SSTable during > compaction if the row appears in other SSTables; however, the compaction > code checks the bloom filters to make this determination. Since this data > is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows > seem to appear in every SSTable as far as compaction is concerned. > > This caused our data to essentially never be removed when using either > STSC or LCS and will probably affect anyone else running 1.1 with high > bloom filter fp ratios. > > Setting our fp ratio to 0.1, running upgradesstables and running the > application as it was before seems to have stabilized the load as desired > at the expense of additional jvm memory. > > -Bryan > > > On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot <btal...@aeriagames.com>wrote: > >> Bleh, I rushed out the email before some meetings and I messed something >> up. Working on reproducing now with better notes this time. >> >> -Bryan >> >> >> >> On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams <de...@fyrie.net> wrote: >> >>> When you ran this test, is that the exact schema you used? I'm not >>> seeing where you are setting gc_grace to 0 (although I could just be blind, >>> it happens). >>> >>> >>> On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot <btal...@aeriagames.com>wrote: >>> >>>> I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, >>>> 1.1.8, a trivial schema, and a simple script that just inserts rows. If >>>> the TTL is small enough so that all LCS data fits in generation 0 then the >>>> rows seem to be removed with TTL expires as desired. However, if the >>>> insertion rate is high enough or the TTL long enough then the data keep >>>> accumulating for far longer than expected. >>>> >>>> Using 120 second TTL and a single threaded php insertion script my MBP >>>> with SSD retained almost all of the data. 120 seconds should accumulate >>>> 5-10 MB of data. I would expect that TTL rows to be removed eventually and >>>> for the cassandra load to level off at some reasonable value near 10 MB. >>>> After running for 2 hours and with a cassandra load of ~550 MB I stopped >>>> the test. >>>> >>>> The schema is >>>> >>>> create keyspace test >>>> with placement_strategy = 'SimpleStrategy' >>>> and strategy_options = {replication_factor : 1} >>>> and durable_writes = true; >>>> >>>> use test; >>>> >>>> create column family test >>>> with column_type = 'Standard' >>>> and comparator = 'UTF8Type' >>>> and default_validation_class = 'UTF8Type' >>>> and key_validation_class = 'TimeUUIDType' >>>> and compaction_strategy = >>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' >>>> and caching = 'NONE' >>>> and bloom_filter_fp_chance = 1.0 >>>> and column_metadata = [ >>>> {column_name : 'a', >>>> validation_class : LongType}]; >>>> >>>> >>>> and the insert script is >>>> >>>> <?php >>>> >>>> require_once('phpcassa/1.0.a.5/autoload.php'); >>>> >>>> use phpcassa\Connection\ConnectionPool; >>>> use phpcassa\ColumnFamily; >>>> use phpcassa\SystemManager; >>>> use phpcassa\UUID; >>>> >>>> // Connect to test keyspace and column family >>>> $sys = new SystemManager('127.0.0.1'); >>>> >>>> // Start a connection pool, create our ColumnFamily instance >>>> $pool = new ConnectionPool('test', array('127.0.0.1')); >>>> $testCf = new ColumnFamily($pool, 'test'); >>>> >>>> // Insert records >>>> while( 1 ) { >>>> $testCf->insert(UUID::uuid1(), array("a" => 1), null, 120); >>>> } >>>> >>>> // Close our connections >>>> $pool->close(); >>>> $sys->close(); >>>> >>>> ?> >>>> >>>> >>>> -Bryan >>>> >>>> >>>> >>>> >>>> On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot >>>> <btal...@aeriagames.com>wrote: >>>> >>>>> We are using LCS and the particular row I've referenced has been >>>>> involved in several compactions after all columns have TTL expired. The >>>>> most recent one was again this morning and the row is still there -- TTL >>>>> expired for several days now with gc_grace=0 and several compactions later >>>>> ... >>>>> >>>>> >>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary >>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4 >>>>> >>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db >>>>> >>>>> $> ls -alF >>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db >>>>> -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 >>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db >>>>> >>>>> >>>>> $> ./bin/sstable2json >>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db >>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 >>>>> "%x"') >>>>> { >>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234": >>>>> [["app_name","50f21d3d",1357785277207001,"d"], >>>>> ["client_ip","50f21d3d",1357785277207001,"d"], >>>>> ["client_req_id","50f21d3d",1357785277207001,"d"], >>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], >>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"], >>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], >>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], >>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"], >>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"], >>>>> ["req_method","50f21d3d",1357785277207001,"d"], >>>>> ["req_service","50f21d3d",1357785277207001,"d"], >>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"], >>>>> ["success","50f21d3d",1357785277207001,"d"]] >>>>> } >>>>> >>>>> >>>>> My experience with TTL columns so far has been pretty similar to >>>>> Viktor's in that the only way to keep them row count under control is to >>>>> force major compactions. In real world use, STCS and LCS both leave TTL >>>>> expired rows around forever as far as I can tell. When testing with >>>>> minimal data, removal of TTL expired rows seem to work as expected but in >>>>> this case there seems to be some divergence from real life work and test >>>>> samples. >>>>> >>>>> -Bryan >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov < >>>>> viktor.jevdoki...@adform.com> wrote: >>>>> >>>>>> @Bryan,**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> To keep data size as low as possible with TTL columns we still use >>>>>> STCS and nightly major compactions.**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Experience with LCS was not successful in our case, data size keeps >>>>>> too high along with amount of compactions.**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> IMO, before 1.2, LCS was good for CFs without TTL or high delete >>>>>> rate. I have not tested 1.2 LCS behavior, we’re still on 1.0.x**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> Best regards / Pagarbiai >>>>>> *Viktor Jevdokimov* >>>>>> Senior Developer >>>>>> >>>>>> Email: viktor.jevdoki...@adform.com >>>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453 >>>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania >>>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider> >>>>>> Take a ride with Adform's Rich Media >>>>>> Suite<http://vimeo.com/adform/richmedia> >>>>>> [image: Adform News] <http://www.adform.com> >>>>>> [image: Adform awarded the Best Employer 2012] >>>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/> >>>>>> >>>>>> Disclaimer: The information contained in this message and attachments >>>>>> is intended solely for the attention and use of the named addressee and >>>>>> may >>>>>> be confidential. If you are not the intended recipient, you are reminded >>>>>> that the information remains the property of the sender. You must not >>>>>> use, >>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have >>>>>> received this message in error, please contact the sender immediately and >>>>>> irrevocably delete this message and any copies. >>>>>> >>>>>> *From:* aaron morton [mailto:aa...@thelastpickle.com] >>>>>> *Sent:* Thursday, January 17, 2013 06:24 >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* Re: LCS not removing rows with all TTL expired columns**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Minor compaction (with Size Tiered) will only purge tombstones if all >>>>>> fragments of a row are contained in the SSTables being compacted. So if >>>>>> you >>>>>> have a long lived row, that is present in many size tiers, the columns >>>>>> will >>>>>> not be purged. **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> (thus compacted compacted) 3 days after all columns for that row >>>>>> had expired**** >>>>>> >>>>>> Tombstones have to get on disk, even if you set the gc_grace_seconds >>>>>> to 0. If not they do not get a chance to delete previous versions of the >>>>>> column which already exist on disk. So when the compaction ran your >>>>>> ExpiringColumn was turned into a DeletedColumn and placed on disk. ** >>>>>> ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> I would expect the next round of compaction to remove these columns. >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> There is a new feature in 1.2 that may help you here. It will do a >>>>>> special compaction of individual sstables when they have a certain >>>>>> proportion of dead columns >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-3442 **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Also interested to know if LCS helps. **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Cheers**** >>>>>> >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> -----------------**** >>>>>> >>>>>> Aaron Morton**** >>>>>> >>>>>> Freelance Cassandra Developer**** >>>>>> >>>>>> New Zealand**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> @aaronmorton**** >>>>>> >>>>>> http://www.thelastpickle.com**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> On 17/01/2013, at 2:55 PM, Bryan Talbot <btal...@aeriagames.com> >>>>>> wrote:**** >>>>>> >>>>>> >>>>>> >>>>>> **** >>>>>> >>>>>> According to the timestamps (see original post) the SSTable was >>>>>> written (thus compacted compacted) 3 days after all columns for that row >>>>>> had expired and 6 days after the row was created; yet all columns are >>>>>> still >>>>>> showing up in the SSTable. Note that the column shows now rows when a >>>>>> "get" for that key is run so that's working correctly, but the data is >>>>>> lugged around far longer than it should be -- maybe forever.**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> -Bryan**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com> >>>>>> wrote:**** >>>>>> >>>>>> To get column removed you have to meet two requirements **** >>>>>> >>>>>> 1. column should be expired**** >>>>>> >>>>>> 2. after that CF gets compacted**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> I guess your expired columns are propagated to high tier CF, which >>>>>> gets compacted rarely.**** >>>>>> >>>>>> So, you have to wait when high tier CF gets compacted. **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Andrey**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot < >>>>>> btal...@aeriagames.com> wrote:**** >>>>>> >>>>>> On cassandra 1.1.5 with a write heavy workload, we're having problems >>>>>> getting rows to be compacted away (removed) even though all columns have >>>>>> expired TTL. We've tried size tiered and now leveled and are seeing the >>>>>> same symptom: the data stays around essentially forever. **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Currently we write all columns with a TTL of 72 hours (259200 >>>>>> seconds) and expect to add 10 GB of data to this CF per day per node. >>>>>> Each >>>>>> node currently has 73 GB for the affected CF and shows no indications >>>>>> that >>>>>> old rows will be removed on their own.**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Why aren't rows being removed? Below is some data from a sample row >>>>>> which should have been removed several days ago but is still around even >>>>>> though it has been involved in numerous compactions since being expired. >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> $> ./bin/nodetool -h localhost getsstables metrics request_summary >>>>>> 459fb460-5ace-11e2-9b92-11d67b6163b4**** >>>>>> >>>>>> >>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> $> ls -alF >>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >>>>>> **** >>>>>> >>>>>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 >>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> $> ./bin/sstable2json >>>>>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >>>>>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 >>>>>> "%x"') >>>>>> **** >>>>>> >>>>>> {**** >>>>>> >>>>>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234": >>>>>> [["app_name","50f21d3d",1357785277207001,"d"], >>>>>> ["client_ip","50f21d3d",1357785277207001,"d"], >>>>>> ["client_req_id","50f21d3d",1357785277207001,"d"], >>>>>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], >>>>>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"], >>>>>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], >>>>>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], >>>>>> ["req_duration_us","50f21d3d",1357785277207001,"d"], >>>>>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"], >>>>>> ["req_method","50f21d3d",1357785277207001,"d"], >>>>>> ["req_service","50f21d3d",1357785277207001,"d"], >>>>>> ["req_start_time_us","50f21d3d",1357785277207001,"d"], >>>>>> ["success","50f21d3d",1357785277207001,"d"]]**** >>>>>> >>>>>> }**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Decoding the column timestamps to shows that the columns were written >>>>>> at "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 >>>>>> Jan 2013 02:34:37 GMT". The date of the SSTable shows that it was >>>>>> generated on Jan 16 which is 3 days after all columns have TTL-ed out. >>>>>> **** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> The schema shows that gc_grace is set to 0 since this data is >>>>>> write-once, read-seldom and is never updated or deleted.**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> create column family request_summary**** >>>>>> >>>>>> with column_type = 'Standard'**** >>>>>> >>>>>> and comparator = 'UTF8Type'**** >>>>>> >>>>>> and default_validation_class = 'UTF8Type'**** >>>>>> >>>>>> and key_validation_class = 'UTF8Type'**** >>>>>> >>>>>> and read_repair_chance = 0.1**** >>>>>> >>>>>> and dclocal_read_repair_chance = 0.0**** >>>>>> >>>>>> and gc_grace = 0**** >>>>>> >>>>>> and min_compaction_threshold = 4**** >>>>>> >>>>>> and max_compaction_threshold = 32**** >>>>>> >>>>>> and replicate_on_write = true**** >>>>>> >>>>>> and compaction_strategy = >>>>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'**** >>>>>> >>>>>> and caching = 'NONE'**** >>>>>> >>>>>> and bloom_filter_fp_chance = 1.0**** >>>>>> >>>>>> and compression_options = {'chunk_length_kb' : '64', >>>>>> 'sstable_compression' : >>>>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> Thanks in advance for help in understanding why rows such as this are >>>>>> not removed!**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> -Bryan**** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>>> ** ** >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Derek Williams >>> >> > -- Derek Williams
<<signature-logo29.png>>
<<signature-best-employer-logo4823.png>>