Re: TWCS Compaction backed up

brian . spindler Tue, 07 Aug 2018 17:16:57 -0700

Everything is ttl’d 

I suppose I could use sstablemeta to see the repaired bit, could I just set 
that to unrepaired somehow and that would fix?


Thanks!

> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> May be worth seeing if any of the sstables got promoted to repaired - if so 
> they’re not eligible for compaction with unrepaired sstables and that could 
> explain some higher counts
> 
> Do you actually do deletes or is everything ttl’d?
>  
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spind...@gmail.com> wrote:
>> 
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are 
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.   
>> 
>> Re incremental repair; Yes one of my engineers started an incremental repair 
>> on this column family that we had to abort.  In fact, the node that the 
>> repair was initiated on ran out of disk space and we ended replacing that 
>> node like a dead node.   
>> 
>> Oddly the new node is experiencing this issue as well.  
>> 
>> -B
>> 
>> 
>>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>> You could toggle off the tombstone compaction to see if that helps, but 
>>> that should be lower priority than normal compactions
>>> 
>>> Are the lots-of-little-files from memtable flushes or repair/anticompaction?
>>> 
>>> Do you do normal deletes? Did you try to run Incremental repair?  
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spind...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Hi Jonathan, both I believe.  
>>>> 
>>>> The window size is 1 day, full settings: 
>>>>     AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
>>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
>>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': 
>>>> '86400', 'tombstone_threshold': '0.2', 'class': 
>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
>>>> 
>>>> 
>>>> nodetool tpstats 
>>>> 
>>>> Pool Name                    Active   Pending      Completed   Blocked  
>>>> All time blocked
>>>> MutationStage                     0         0    68582241832         0     
>>>>             0
>>>> ReadStage                         0         0      209566303         0     
>>>>             0
>>>> RequestResponseStage              0         0    44680860850         0     
>>>>             0
>>>> ReadRepairStage                   0         0       24562722         0     
>>>>             0
>>>> CounterMutationStage              0         0              0         0     
>>>>             0
>>>> MiscStage                         0         0              0         0     
>>>>             0
>>>> HintedHandoff                     1         1            203         0     
>>>>             0
>>>> GossipStage                       0         0        8471784         0     
>>>>             0
>>>> CacheCleanupExecutor              0         0            122         0     
>>>>             0
>>>> InternalResponseStage             0         0         552125         0     
>>>>             0
>>>> CommitLogArchiver                 0         0              0         0     
>>>>             0
>>>> CompactionExecutor                8        42        1433715         0     
>>>>             0
>>>> ValidationExecutor                0         0           2521         0     
>>>>             0
>>>> MigrationStage                    0         0         527549         0     
>>>>             0
>>>> AntiEntropyStage                  0         0           7697         0     
>>>>             0
>>>> PendingRangeCalculator            0         0             17         0     
>>>>             0
>>>> Sampler                           0         0              0         0     
>>>>             0
>>>> MemtableFlushWriter               0         0         116966         0     
>>>>             0
>>>> MemtablePostFlush                 0         0         209103         0     
>>>>             0
>>>> MemtableReclaimMemory             0         0         116966         0     
>>>>             0
>>>> Native-Transport-Requests         1         0     1715937778         0     
>>>>        176262
>>>> 
>>>> Message type           Dropped
>>>> READ                         2
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                  4390
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE          1882
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                  0
>>>> 
>>>> 
>>>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote:
>>>>> What's your window size?
>>>>> 
>>>>> When you say backed up, how are you measuring that?  Are there pending 
>>>>> tasks or do you just see more files than you expect?
>>>>> 
>>>>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spind...@gmail.com> 
>>>>>> wrote:
>>>>>> Hey guys, quick question: 
>>>>>>  
>>>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on 
>>>>>> one drive, data on nvme.  That was working very well, it's a ts db and 
>>>>>> has been accumulating data for about 4weeks.  
>>>>>> 
>>>>>> The nodes have increased in load and compaction seems to be falling 
>>>>>> behind.  I used to get about 1 file per day for this column family, 
>>>>>> about ~30GB Data.db file per day.  I am now getting hundreds per day at  
>>>>>> 1mb - 50mb.
>>>>>> 
>>>>>> How to recover from this? 
>>>>>> 
>>>>>> I can scale out to give some breathing room but will it go back and 
>>>>>> compact the old days into nicely packed files for the day?    
>>>>>> 
>>>>>> I tried setting compaction throughput to 1000 from 256 and it seemed to 
>>>>>> make things worse for the CPU, it's configured on i3.2xl with 8 
>>>>>> compaction threads. 
>>>>>> 
>>>>>> -B
>>>>>> 
>>>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) 
>>>>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS 
>>>>>> column families causes a very large spike in sstable counts due to 
>>>>>> anti-compaction which causes a lot of disruption, is there any other 
>>>>>> way?  
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Jon Haddad
>>>>> http://www.rustyrazorblade.com
>>>>> twitter: rustyrazorblade

Re: TWCS Compaction backed up

Reply via email to