Oh and second, are you attempting a major compact while you have all those pending compactions?
Try letting the cluster catch up on compactions. Having that many pending is bad. If you have replication factor of 3 and quorum you could go node by node and disable binary, raise concurrent compactors to 4 and unthrottle compactions by setting throughput to zero. This can help it catch up on those compactions. Then you can deal with trying a major compaction. Regards, Evelyn. > On 5 Apr 2018, at 11:14 pm, Evelyn Smith <u5015...@gmail.com> wrote: > > Probably a dumb question but it’s good to clarify. > > Are you compacting the whole keyspace or are you compacting tables one at a > time? > >> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai <zpal...@gmail.com >> <mailto:zpal...@gmail.com>> wrote: >> >> Hi! >> >> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) >> and when running the nodetool compact command on any of the servers I get >> out of memory exception after a while. >> >> - Before calling the compact first I did a repair and before that there was >> a bigger update on a lot of entries so I guess a lot of sstables were >> created. The reapir created around ~250 pending compaction tasks, 2 of the >> nodes I managed to finish with upgrading to a 2xlarge machine and twice the >> heap (but running the compact on them manually also killed one :/ so this >> isn't an ideal solution) >> >> Some more info: >> - Version is the newest 3.11.2 with java8u116 >> - Using LeveledCompactionStrategy (we have mostly reads) >> - Heap size is set to 8GB >> - Using G1GC >> - I tried moving the memtable out of the heap. It helped but I still got an >> OOM last night >> - Concurrent compactors is set to 1 but it still happens and also tried >> setting throughput between 16 and 128, no changes. >> - Storage load is 127Gb/140Gb/151Gb/155Gb >> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables. >> - The biggest partition I found was 90Mb but that table has only 2 sstables >> attached and compacts in seconds. The rest is mostly 1 line partition with a >> few 10KB of data. >> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, >> 0, 0, 0] >> >> In the metrics it looks something like this before dying: >> https://ibb.co/kLhdXH <https://ibb.co/kLhdXH> >> >> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH >> <https://ibb.co/ctkyXH> >> >> The load is usually pretty low, the nodes are almost idling (avg 500 >> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 >> writes) and the pending tasks is also around 0 usually. >> >> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes >> cause problems? I could finish some bigger compactions where there was no >> index attached but I'm not sure 100% if this is the cause. >> >> Thanks, >> Zsolt >> >> >> >