Re: TWCS Compaction backed up

Brian Spindler Tue, 07 Aug 2018 17:10:29 -0700

Hi Jeff, mostly lots of little files, like there will be 4-5 that are
1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.


Re incremental repair; Yes one of my engineers started an incremental
repair on this column family that we had to abort.  In fact, the node that
the repair was initiated on ran out of disk space and we ended replacing
that node like a dead node.

Oddly the new node is experiencing this issue as well.

-B


On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jji...@gmail.com> wrote:

> You could toggle off the tombstone compaction to see if that helps, but
> that should be lower priority than normal compactions
>
> Are the lots-of-little-files from memtable flushes or
> repair/anticompaction?
>
> Do you do normal deletes? Did you try to run Incremental repair?
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spind...@gmail.com>
> wrote:
>
> Hi Jonathan, both I believe.
>
> The window size is 1 day, full settings:
>     AND compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
> 'tombstone_threshold': '0.2', 'class':
> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>
>
> nodetool tpstats
>
> Pool Name                    Active   Pending      Completed   Blocked
> All time blocked
> MutationStage                     0         0    68582241832         0
>              0
> ReadStage                         0         0      209566303         0
>              0
> RequestResponseStage              0         0    44680860850         0
>              0
> ReadRepairStage                   0         0       24562722         0
>              0
> CounterMutationStage              0         0              0         0
>              0
> MiscStage                         0         0              0         0
>              0
> HintedHandoff                     1         1            203         0
>              0
> GossipStage                       0         0        8471784         0
>              0
> CacheCleanupExecutor              0         0            122         0
>              0
> InternalResponseStage             0         0         552125         0
>              0
> CommitLogArchiver                 0         0              0         0
>              0
> CompactionExecutor                8        42        1433715         0
>              0
> ValidationExecutor                0         0           2521         0
>              0
> MigrationStage                    0         0         527549         0
>              0
> AntiEntropyStage                  0         0           7697         0
>              0
> PendingRangeCalculator            0         0             17         0
>              0
> Sampler                           0         0              0         0
>              0
> MemtableFlushWriter               0         0         116966         0
>              0
> MemtablePostFlush                 0         0         209103         0
>              0
> MemtableReclaimMemory             0         0         116966         0
>              0
> Native-Transport-Requests         1         0     1715937778         0
>         176262
>
> Message type           Dropped
> READ                         2
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                  4390
> COUNTER_MUTATION             0
> BINARY                       0
> REQUEST_RESPONSE          1882
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
>
> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote:
>
>> What's your window size?
>>
>> When you say backed up, how are you measuring that?  Are there pending
>> tasks or do you just see more files than you expect?
>>
>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spind...@gmail.com>
>> wrote:
>>
>>> Hey guys, quick question:
>>>
>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
>>> one drive, data on nvme.  That was working very well, it's a ts db and has
>>> been accumulating data for about 4weeks.
>>>
>>> The nodes have increased in load and compaction seems to be falling
>>> behind.  I used to get about 1 file per day for this column family, about
>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>>> 50mb.
>>>
>>> How to recover from this?
>>>
>>> I can scale out to give some breathing room but will it go back and
>>> compact the old days into nicely packed files for the day?
>>>
>>> I tried setting compaction throughput to 1000 from 256 and it seemed to
>>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
>>> threads.
>>>
>>> -B
>>>
>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think)
>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS column
>>> families causes a very large spike in sstable counts due to anti-compaction
>>> which causes a lot of disruption, is there any other way?
>>>
>>>
>>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>

Re: TWCS Compaction backed up

Reply via email to