Yes, leveled compaction strategy. Concurrent compactors were 2, I changed to 8 recently and no change. Also at same time changed compaction throughput from 64 to to 384 mb/s. The number of pending was still increasing after the change. Other nodes are handling the same throughput with the previous compaction settings.
We are using c4.2xlarge in ec2. 8 vCPUs, ssds, 15GB memory. No errors or exceptions in logs. Some possibly relevant log entries I noticed: INFO [CompactionExecutor:16] 2016-08-17 19:15:04,711 > CompactionManager.java:654 - Will not compact > /export/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-961-big: > it is not an active sstable > > INFO [CompactionExecutor:16] 2016-08-17 19:15:04,711 > CompactionManager.java:654 - Will not compact > /export/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-960-big: > it is not an active sstable > > INFO [CompactionExecutor:16] 2016-08-17 19:15:04,711 > CompactionManager.java:664 - No files to compact for user defined compaction > WARN [CompactionExecutor:3] 2016-08-16 19:52:07,134 > BigTableWriter.java:184 - Writing large partition > system/hints:3b4f02ef-ac1f-4bea-9d0c-1048564b749d (150461319 bytes) WARN [CompactionExecutor:3] 2016-08-16 19:52:09,501 > BigTableWriter.java:184 - Writing large partition > system/hints:3b4f02ef-ac1f-4bea-9d0c-1048564b749d (149619989 bytes) WARN [epollEventLoopGroup-2-2] 2016-08-16 19:52:12,911 Frame.java:203 - > Detected connection using native protocol version 2. Both version 1 and 2 > of the native protocol are now deprecated and support will be removed in > Cassandra 3.0. You are encouraged to upgrade to a client driver using > version 3 of the native protocol WARN [GossipTasks:1] 2016-08-16 20:51:45,643 FailureDetector.java:287 - > Not marking nodes down due to local pause of 131385662140 > 5000000000 WARN [CompactionExecutor:5] 2016-08-17 01:50:05,200 > MajorLeveledCompactionWriter.java:63 - Many sstables involved in > compaction, skipping storing ancestor information to avoid running out of > memory WARN [CompactionExecutor:4] 2016-08-17 01:50:48,684 > MajorLeveledCompactionWriter.java:63 - Many sstables involved in > compaction, skipping storing ancestor information to avoid running out of > memory WARN [GossipTasks:1] 2016-08-17 04:35:10,697 FailureDetector.java:287 - > Not marking nodes down due to local pause of 8628650983 > 5000000000 WARN [GossipTasks:1] 2016-08-17 04:42:55,524 FailureDetector.java:287 - > Not marking nodes down due to local pause of 9141089664 > 5000000000 On Wed, Aug 17, 2016 at 11:49 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > What compaction strategy? Looks like leveled – is that what you expect? > > > > Any exceptions in the logs? > > > > Are you throttling compaction? > > > > SSD or spinning disks? > > > > How many cores? > > > > How many concurrent compactors? > > > > > > > > *From: *Ezra Stuetzel <ezra.stuet...@riskiq.net> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Wednesday, August 17, 2016 at 11:39 AM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *large number of pending compactions, sstables steadily > increasing > > > > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to > fix issue) which seems to be stuck in a weird state -- with a large number > of pending compactions and sstables. The node is compacting about > 500gb/day, number of pending compactions is going up at about 50/day. It is > at about 2300 pending compactions now. I have tried increasing number of > compaction threads and the compaction throughput, which doesn't seem to > help eliminate the many pending compactions. > > > > I have tried running 'nodetool cleanup' and 'nodetool compact'. The latter > has fixed the issue in the past, but most recently I was getting OOM > errors, probably due to the large number of sstables. I upgraded to 2.2.7 > and am no longer getting OOM errors, but also it does not resolve the > issue. I do see this message in the logs: > > > > INFO [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985 > CompactionManager.java:610 - Cannot perform a full major compaction as > repaired and unrepaired sstables cannot be compacted together. These two > set of sstables will be compacted separately. > > Below are the 'nodetool tablestats' comparing a normal and the problematic > node. You can see problematic node has many many more sstables, and they > are all in level 1. What is the best way to fix this? Can I just delete > those sstables somehow then run a repair? > > Normal node > > keyspace: mykeyspace > > Read Count: 0 > > Read Latency: NaN ms. > > Write Count: 31905656 > > Write Latency: 0.051713177939359714 ms. > > Pending Flushes: 0 > > Table: mytable > > SSTable count: 1908 > > SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306, 0, > 0, 0, 0] > > Space used (live): 301894591442 > > Space used (total): 301894591442 > > > > > > Problematic node > > Keyspace: mykeyspace > > Read Count: 0 > > Read Latency: NaN ms. > > Write Count: 30520190 > > Write Latency: 0.05171286705620116 ms. > > Pending Flushes: 0 > > Table: mytable > > SSTable count: 14105 > > SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0, 0, > 0] > > Space used (live): 561143255289 > > Space used (total): 561143255289 > > Thanks, > > Ezra >