May be worth seeing if any of the sstables got promoted to repaired - if so they’re not eligible for compaction with unrepaired sstables and that could explain some higher counts
Do you actually do deletes or is everything ttl’d? -- Jeff Jirsa > On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spind...@gmail.com> wrote: > > Hi Jeff, mostly lots of little files, like there will be 4-5 that are 1-1.5gb > or so and then many at 5-50MB and many at 40-50MB each. > > Re incremental repair; Yes one of my engineers started an incremental repair > on this column family that we had to abort. In fact, the node that the > repair was initiated on ran out of disk space and we ended replacing that > node like a dead node. > > Oddly the new node is experiencing this issue as well. > > -B > > >> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jji...@gmail.com> wrote: >> You could toggle off the tombstone compaction to see if that helps, but that >> should be lower priority than normal compactions >> >> Are the lots-of-little-files from memtable flushes or repair/anticompaction? >> >> Do you do normal deletes? Did you try to run Incremental repair? >> >> -- >> Jeff Jirsa >> >> >>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spind...@gmail.com> wrote: >>> >>> Hi Jonathan, both I believe. >>> >>> The window size is 1 day, full settings: >>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', >>> 'tombstone_threshold': '0.2', 'class': >>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>> >>> >>> nodetool tpstats >>> >>> Pool Name Active Pending Completed Blocked All >>> time blocked >>> MutationStage 0 0 68582241832 0 >>> 0 >>> ReadStage 0 0 209566303 0 >>> 0 >>> RequestResponseStage 0 0 44680860850 0 >>> 0 >>> ReadRepairStage 0 0 24562722 0 >>> 0 >>> CounterMutationStage 0 0 0 0 >>> 0 >>> MiscStage 0 0 0 0 >>> 0 >>> HintedHandoff 1 1 203 0 >>> 0 >>> GossipStage 0 0 8471784 0 >>> 0 >>> CacheCleanupExecutor 0 0 122 0 >>> 0 >>> InternalResponseStage 0 0 552125 0 >>> 0 >>> CommitLogArchiver 0 0 0 0 >>> 0 >>> CompactionExecutor 8 42 1433715 0 >>> 0 >>> ValidationExecutor 0 0 2521 0 >>> 0 >>> MigrationStage 0 0 527549 0 >>> 0 >>> AntiEntropyStage 0 0 7697 0 >>> 0 >>> PendingRangeCalculator 0 0 17 0 >>> 0 >>> Sampler 0 0 0 0 >>> 0 >>> MemtableFlushWriter 0 0 116966 0 >>> 0 >>> MemtablePostFlush 0 0 209103 0 >>> 0 >>> MemtableReclaimMemory 0 0 116966 0 >>> 0 >>> Native-Transport-Requests 1 0 1715937778 0 >>> 176262 >>> >>> Message type Dropped >>> READ 2 >>> RANGE_SLICE 0 >>> _TRACE 0 >>> MUTATION 4390 >>> COUNTER_MUTATION 0 >>> BINARY 0 >>> REQUEST_RESPONSE 1882 >>> PAGED_RANGE 0 >>> READ_REPAIR 0 >>> >>> >>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote: >>>> What's your window size? >>>> >>>> When you say backed up, how are you measuring that? Are there pending >>>> tasks or do you just see more files than you expect? >>>> >>>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spind...@gmail.com> >>>>> wrote: >>>>> Hey guys, quick question: >>>>> >>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on >>>>> one drive, data on nvme. That was working very well, it's a ts db and >>>>> has been accumulating data for about 4weeks. >>>>> >>>>> The nodes have increased in load and compaction seems to be falling >>>>> behind. I used to get about 1 file per day for this column family, about >>>>> ~30GB Data.db file per day. I am now getting hundreds per day at 1mb - >>>>> 50mb. >>>>> >>>>> How to recover from this? >>>>> >>>>> I can scale out to give some breathing room but will it go back and >>>>> compact the old days into nicely packed files for the day? >>>>> >>>>> I tried setting compaction throughput to 1000 from 256 and it seemed to >>>>> make things worse for the CPU, it's configured on i3.2xl with 8 >>>>> compaction threads. >>>>> >>>>> -B >>>>> >>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) >>>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS >>>>> column families causes a very large spike in sstable counts due to >>>>> anti-compaction which causes a lot of disruption, is there any other way? >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Jon Haddad >>>> http://www.rustyrazorblade.com >>>> twitter: rustyrazorblade