Re: TWCS sstables not dropping even though all data is expired

Mike Torra Tue, 07 May 2019 06:14:55 -0700

Thx for the tips Jeff, I'm definitely going to start using table level TTLs
(not sure why I didn't before), and I'll take a look at the tombstone
compaction subproperties


On Mon, May 6, 2019 at 10:43 AM Jeff Jirsa <jji...@gmail.com> wrote:

> Fwiw if you enable the tombstone compaction subproperties, you’ll compact
> away most of the other data in those old sstables (but not the partition
> that’s been manually updated)
>
> Also table level TTLs help catch this type of manual manipulation -
> consider adding it if appropriate.
>
> --
> Jeff Jirsa
>
>
> On May 6, 2019, at 7:29 AM, Mike Torra <mto...@salesforce.com.invalid>
> wrote:
>
> Compaction settings:
> ```
> compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS',
> 'max_threshold': '32', 'min_threshold': '4'}
> ```
> read_repair_chance is 0, and I don't do any repairs because (normally)
> everything has a ttl. It does seem like Jeff is right that a manual
> insert/update without a ttl is what caused this, so I know how to resolve
> it and prevent it from happening again.
>
> Thx again for all the help guys, I appreciate it!
>
>
> On Fri, May 3, 2019 at 11:21 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Repairs work fine with TWCS, but having a non-expiring row will prevent
>> tombstones in newer sstables from being purged
>>
>> I suspect someone did a manual insert/update without a ttl and that
>> effectively blocks all other expiring cells from being purged.
>>
>> --
>> Jeff Jirsa
>>
>>
>> On May 3, 2019, at 7:57 PM, Nick Hatfield <nick.hatfi...@metricly.com>
>> wrote:
>>
>> Hi Mike,
>>
>>
>>
>> If you will, share your compaction settings. More than likely, your issue
>> is from 1 of 2 reasons:
>> 1. You have read repair chance set to anything other than 0
>>
>> 2. You’re running repairs on the TWCS CF
>>
>>
>>
>> Or both….
>>
>>
>>
>> *From:* Mike Torra [mailto:mto...@salesforce.com.INVALID
>> <mto...@salesforce.com.INVALID>]
>> *Sent:* Friday, May 03, 2019 3:00 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: TWCS sstables not dropping even though all data is expired
>>
>>
>>
>> Thx for the help Paul - there are definitely some details here I still
>> don't fully understand, but this helped me resolve the problem and know
>> what to look for in the future :)
>>
>>
>>
>> On Fri, May 3, 2019 at 12:44 PM Paul Chandler <p...@redshots.com> wrote:
>>
>> Hi Mike,
>>
>>
>>
>> For TWCS the sstable can only be deleted when all the data has expired in
>> that sstable, but you had a record without a ttl in it, so that sstable
>> could never be deleted.
>>
>>
>>
>> That bit is straight forward, the next bit I remember reading somewhere
>> but can’t find it at the moment to confirm my thinking.
>>
>>
>>
>> An sstable can only be deleted if it is the earliest sstable. I think
>> this is due to the fact that deleting later sstables may expose old
>> versions of the data stored in the stuck sstable which had been superseded.
>> For example, if there was a tombstone in a later sstable for the non TTLed
>> record causing the problem in this instance. Then deleting that sstable
>> would cause that deleted data to reappear. (Someone please correct me if I
>> have this wrong)
>>
>>
>>
>> Because sstables in different time buckets are never compacted together,
>> this problem only goes away when you did the major compaction.
>>
>>
>>
>> This would happen on all replicas of the data, hence the reason you this
>> problem on 3 nodes.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Paul
>>
>> www.redshots.com
>>
>>
>>
>> On 3 May 2019, at 15:35, Mike Torra <mto...@salesforce.com.INVALID>
>> wrote:
>>
>>
>>
>> This does indeed seem to be a problem of overlapping sstables, but I
>> don't understand why the data (and number of sstables) just continues to
>> grow indefinitely. I also don't understand why this problem is only
>> appearing on some nodes. Is it just a coincidence that the one rogue test
>> row without a ttl is at the 'root' sstable causing the problem (ie, from
>> the output of `sstableexpiredblockers`)?
>>
>>
>>
>> Running a full compaction via `nodetool compact` reclaims the disk space,
>> but I'd like to figure out why this happened and prevent it. Understanding
>> why this problem would be isolated the way it is (ie only one CF even
>> though I have a few others that share a very similar schema, and only some
>> nodes) seems like it will help me prevent it.
>>
>>
>>
>>
>>
>> On Thu, May 2, 2019 at 1:00 PM Paul Chandler <p...@redshots.com> wrote:
>>
>> Hi Mike,
>>
>>
>>
>> It sounds like that record may have been deleted, if that is the case
>> then it would still be shown in this sstable, but the deleted tombstone
>> record would be in a later sstable. You can use nodetool getsstables to
>> work out which sstables contain the data.
>>
>>
>>
>> I recommend reading The Last Pickle post on this:
>> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections
>> towards the bottom of this post may well explain why the sstable is not
>> being deleted.
>>
>>
>>
>> Thanks
>>
>>
>>
>> Paul
>>
>> www.redshots.com
>>
>>
>>
>> On 2 May 2019, at 16:08, Mike Torra <mto...@salesforce.com.INVALID>
>> wrote:
>>
>>
>>
>> I'm pretty stumped by this, so here is some more detail if it helps.
>>
>>
>>
>> Here is what the suspicious partition looks like in the `sstabledump`
>> output (some pii etc redacted):
>>
>> ```
>>
>> {
>>
>>     "partition" : {
>>
>>       "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>>
>>       "position" : 210
>>
>>     },
>>
>>     "rows" : [
>>
>>       {
>>
>>         "type" : "row",
>>
>>         "position" : 1132,
>>
>>         "clustering" : [ "2019-01-22 15:27:45.000Z" ],
>>
>>         "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
>>
>>         "cells" : [
>>
>>           { "some": "data" }
>>
>>         ]
>>
>>       }
>>
>>     ]
>>
>>   }
>>
>> ```
>>
>>
>>
>> And here is what every other partition looks like:
>>
>> ```
>>
>> {
>>
>>     "partition" : {
>>
>>       "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>>
>>       "position" : 1133
>>
>>     },
>>
>>     "rows" : [
>>
>>       {
>>
>>         "type" : "row",
>>
>>         "position" : 1234,
>>
>>         "clustering" : [ "2019-01-22 17:59:35.547Z" ],
>>
>>         "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl"
>> : 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
>>
>>         "cells" : [
>>
>>           { "name" : "activity_data", "deletion_info" : {
>> "local_delete_time" : "2019-01-22T17:59:35Z" }
>>
>>           }
>>
>>         ]
>>
>>       }
>>
>>     ]
>>
>>   }
>>
>> ```
>>
>>
>>
>> As expected, almost all of the data except this one suspicious partition
>> has a ttl and is already expired. But if a partition isn't expired and I
>> see it in the sstable, why wouldn't I see it executing a CQL query against
>> the CF? Why would this sstable be preventing so many other sstable's from
>> getting cleaned up?
>>
>>
>>
>> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra <mto...@salesforce.com>
>> wrote:
>>
>> Hello -
>>
>>
>>
>> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few
>> months ago I started noticing disk usage on some nodes increasing
>> consistently. At first I solved the problem by destroying the nodes and
>> rebuilding them, but the problem returns.
>>
>>
>>
>> I did some more investigation recently, and this is what I found:
>>
>> - I narrowed the problem down to a CF that uses TWCS, by simply looking
>> at disk space usage
>>
>> - in each region, 3 nodes have this problem of growing disk space
>> (matches replication factor)
>>
>> - on each node, I tracked down the problem to a particular SSTable using
>> `sstableexpiredblockers`
>>
>> - in the SSTable, using `sstabledump`, I found a row that does not have a
>> ttl like the other rows, and appears to be from someone else on the team
>> testing something and forgetting to include a ttl
>>
>> - all other rows show "expired: true" except this one, hence my suspicion
>>
>> - when I query for that particular partition key, I get no results
>>
>> - I tried deleting the row anyways, but that didn't seem to change
>> anything
>>
>> - I also tried `nodetool scrub`, but that didn't help either
>>
>>
>>
>> Would this rogue row without a ttl explain the problem? If so, why? If
>> not, does anyone have any other ideas? Why does the row show in
>> `sstabledump` but not when I query for it?
>>
>>
>>
>> I appreciate any help or suggestions!
>>
>>
>>
>> - Mike
>>
>>
>>
>>
>>
>>

Re: TWCS sstables not dropping even though all data is expired

Reply via email to