Yes, leveled compaction strategy.

Concurrent compactors were 2, I changed to 8 recently and no change. Also
at same time changed compaction throughput from 64 to to 384 mb/s. The
number of pending was still increasing after the change. Other nodes are
handling the same throughput with the previous compaction settings.

We are using c4.2xlarge in ec2. 8 vCPUs, ssds, 15GB memory.

No errors or exceptions in logs. Some possibly relevant log entries I
noticed:

INFO  [CompactionExecutor:16] 2016-08-17 19:15:04,711
> CompactionManager.java:654 - Will not compact
> /export/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-961-big:
> it is not an active sstable
>
> INFO  [CompactionExecutor:16] 2016-08-17 19:15:04,711
> CompactionManager.java:654 - Will not compact
> /export/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/lb-960-big:
> it is not an active sstable
>
> INFO  [CompactionExecutor:16] 2016-08-17 19:15:04,711
> CompactionManager.java:664 - No files to compact for user defined compaction
>
WARN  [CompactionExecutor:3] 2016-08-16 19:52:07,134
> BigTableWriter.java:184 - Writing large partition
> system/hints:3b4f02ef-ac1f-4bea-9d0c-1048564b749d (150461319 bytes)

WARN  [CompactionExecutor:3] 2016-08-16 19:52:09,501
> BigTableWriter.java:184 - Writing large partition
> system/hints:3b4f02ef-ac1f-4bea-9d0c-1048564b749d (149619989 bytes)

WARN  [epollEventLoopGroup-2-2] 2016-08-16 19:52:12,911 Frame.java:203 -
> Detected connection using native protocol version 2. Both version 1 and 2
> of the native protocol are now deprecated and support will be removed in
> Cassandra 3.0. You are encouraged to upgrade to a client driver using
> version 3 of the native protocol

WARN  [GossipTasks:1] 2016-08-16 20:51:45,643 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 131385662140 > 5000000000

WARN  [CompactionExecutor:5] 2016-08-17 01:50:05,200
> MajorLeveledCompactionWriter.java:63 - Many sstables involved in
> compaction, skipping storing ancestor information to avoid running out of
> memory

WARN  [CompactionExecutor:4] 2016-08-17 01:50:48,684
> MajorLeveledCompactionWriter.java:63 - Many sstables involved in
> compaction, skipping storing ancestor information to avoid running out of
> memory

WARN  [GossipTasks:1] 2016-08-17 04:35:10,697 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 8628650983 > 5000000000

WARN  [GossipTasks:1] 2016-08-17 04:42:55,524 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 9141089664 > 5000000000




On Wed, Aug 17, 2016 at 11:49 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> What compaction strategy? Looks like leveled – is that what you expect?
>
>
>
> Any exceptions in the logs?
>
>
>
> Are you throttling compaction?
>
>
>
> SSD or spinning disks?
>
>
>
> How many cores?
>
>
>
> How many concurrent compactors?
>
>
>
>
>
>
>
> *From: *Ezra Stuetzel <ezra.stuet...@riskiq.net>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, August 17, 2016 at 11:39 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *large number of pending compactions, sstables steadily
> increasing
>
>
>
> I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
> fix issue) which seems to be stuck in a weird state -- with a large number
> of pending compactions and sstables. The node is compacting about
> 500gb/day, number of pending compactions is going up at about 50/day. It is
> at about 2300 pending compactions now. I have tried increasing number of
> compaction threads and the compaction throughput, which doesn't seem to
> help eliminate the many pending compactions.
>
>
>
> I have tried running 'nodetool cleanup' and 'nodetool compact'. The latter
> has fixed the issue in the past, but most recently I was getting OOM
> errors, probably due to the large number of sstables. I upgraded to 2.2.7
> and am no longer getting OOM errors, but also it does not resolve the
> issue. I do see this message in the logs:
>
>
>
> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
> CompactionManager.java:610 - Cannot perform a full major compaction as
> repaired and unrepaired sstables cannot be compacted together. These two
> set of sstables will be compacted separately.
>
> Below are the 'nodetool tablestats' comparing a normal and the problematic
> node. You can see problematic node has many many more sstables, and they
> are all in level 1. What is the best way to fix this? Can I just delete
> those sstables somehow then run a repair?
>
> Normal node
>
> keyspace: mykeyspace
>
>     Read Count: 0
>
>     Read Latency: NaN ms.
>
>     Write Count: 31905656
>
>     Write Latency: 0.051713177939359714 ms.
>
>     Pending Flushes: 0
>
>         Table: mytable
>
>         SSTable count: 1908
>
>         SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306, 0,
> 0, 0, 0]
>
>         Space used (live): 301894591442
>
>         Space used (total): 301894591442
>
>
>
>
>
> Problematic node
>
> Keyspace: mykeyspace
>
>     Read Count: 0
>
>     Read Latency: NaN ms.
>
>     Write Count: 30520190
>
>     Write Latency: 0.05171286705620116 ms.
>
>     Pending Flushes: 0
>
>         Table: mytable
>
>         SSTable count: 14105
>
>         SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0, 0,
> 0]
>
>         Space used (live): 561143255289
>
>         Space used (total): 561143255289
>
> Thanks,
>
> Ezra
>

Reply via email to