Re: TWCS Compaction backed up

Jeff Jirsa Tue, 07 Aug 2018 17:12:53 -0700

May be worth seeing if any of the sstables got promoted to repaired - if so 
they’re not eligible for compaction with unrepaired sstables and that could 
explain some higher counts


Do you actually do deletes or is everything ttl’d?
 

-- 
Jeff Jirsa


> On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spind...@gmail.com> wrote:
> 
> Hi Jeff, mostly lots of little files, like there will be 4-5 that are 1-1.5gb 
> or so and then many at 5-50MB and many at 40-50MB each.   
> 
> Re incremental repair; Yes one of my engineers started an incremental repair 
> on this column family that we had to abort.  In fact, the node that the 
> repair was initiated on ran out of disk space and we ended replacing that 
> node like a dead node.   
> 
> Oddly the new node is experiencing this issue as well.  
> 
> -B
> 
> 
>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jji...@gmail.com> wrote:
>> You could toggle off the tombstone compaction to see if that helps, but that 
>> should be lower priority than normal compactions
>> 
>> Are the lots-of-little-files from memtable flushes or repair/anticompaction?
>> 
>> Do you do normal deletes? Did you try to run Incremental repair?  
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spind...@gmail.com> wrote:
>>> 
>>> Hi Jonathan, both I believe.  
>>> 
>>> The window size is 1 day, full settings: 
>>>     AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', 
>>> 'tombstone_threshold': '0.2', 'class': 
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
>>> 
>>> 
>>> nodetool tpstats 
>>> 
>>> Pool Name                    Active   Pending      Completed   Blocked  All 
>>> time blocked
>>> MutationStage                     0         0    68582241832         0      
>>>            0
>>> ReadStage                         0         0      209566303         0      
>>>            0
>>> RequestResponseStage              0         0    44680860850         0      
>>>            0
>>> ReadRepairStage                   0         0       24562722         0      
>>>            0
>>> CounterMutationStage              0         0              0         0      
>>>            0
>>> MiscStage                         0         0              0         0      
>>>            0
>>> HintedHandoff                     1         1            203         0      
>>>            0
>>> GossipStage                       0         0        8471784         0      
>>>            0
>>> CacheCleanupExecutor              0         0            122         0      
>>>            0
>>> InternalResponseStage             0         0         552125         0      
>>>            0
>>> CommitLogArchiver                 0         0              0         0      
>>>            0
>>> CompactionExecutor                8        42        1433715         0      
>>>            0
>>> ValidationExecutor                0         0           2521         0      
>>>            0
>>> MigrationStage                    0         0         527549         0      
>>>            0
>>> AntiEntropyStage                  0         0           7697         0      
>>>            0
>>> PendingRangeCalculator            0         0             17         0      
>>>            0
>>> Sampler                           0         0              0         0      
>>>            0
>>> MemtableFlushWriter               0         0         116966         0      
>>>            0
>>> MemtablePostFlush                 0         0         209103         0      
>>>            0
>>> MemtableReclaimMemory             0         0         116966         0      
>>>            0
>>> Native-Transport-Requests         1         0     1715937778         0      
>>>       176262
>>> 
>>> Message type           Dropped
>>> READ                         2
>>> RANGE_SLICE                  0
>>> _TRACE                       0
>>> MUTATION                  4390
>>> COUNTER_MUTATION             0
>>> BINARY                       0
>>> REQUEST_RESPONSE          1882
>>> PAGED_RANGE                  0
>>> READ_REPAIR                  0
>>> 
>>> 
>>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote:
>>>> What's your window size?
>>>> 
>>>> When you say backed up, how are you measuring that?  Are there pending 
>>>> tasks or do you just see more files than you expect?
>>>> 
>>>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spind...@gmail.com> 
>>>>> wrote:
>>>>> Hey guys, quick question: 
>>>>>  
>>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on 
>>>>> one drive, data on nvme.  That was working very well, it's a ts db and 
>>>>> has been accumulating data for about 4weeks.  
>>>>> 
>>>>> The nodes have increased in load and compaction seems to be falling 
>>>>> behind.  I used to get about 1 file per day for this column family, about 
>>>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb - 
>>>>> 50mb.
>>>>> 
>>>>> How to recover from this? 
>>>>> 
>>>>> I can scale out to give some breathing room but will it go back and 
>>>>> compact the old days into nicely packed files for the day?    
>>>>> 
>>>>> I tried setting compaction throughput to 1000 from 256 and it seemed to 
>>>>> make things worse for the CPU, it's configured on i3.2xl with 8 
>>>>> compaction threads. 
>>>>> 
>>>>> -B
>>>>> 
>>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) 
>>>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS 
>>>>> column families causes a very large spike in sstable counts due to 
>>>>> anti-compaction which causes a lot of disruption, is there any other way? 
>>>>>  
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade

Re: TWCS Compaction backed up

Reply via email to