Hi All,
I upgraded Cassandra from v3.11.4 to v3.11.8.
The upgrade went smoothly, however, after a few hours, a node crashed on
OOM and a few hours later, another one crashed.
Seems like they crashed from excessive GC behaviour (CMS). The logs show
Map failures on CompactionExecutor:
ERROR *[CompactionExecutor:744] *2020-12-11 03:25:42,169
JVMStabilityInspector.java:94 - OutOfMemory error letting the JVM handle
the error:
ERROR [CompactionExecutor:744] 2020-12-11 03:25:37,765
CassandraDaemon.java:235 - Exception in thread
Thread[CompactionExecutor:744,1,main]
org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
at
org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157)
at
org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310)
at
org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246)
at
org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170)
at
org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73)
...
...
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at
org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153)
... 23 common frames omitted
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
... 24 common frames omitted
*[CompactionExecutor:744] did the following before the crash:*
INFO [CompactionExecutor:744] 2020-12-11 03:00:29,985 NoSpamLogger.java:91
- Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437
BigTableWriter.java:211 - Writing large partition XXXX (108.963MiB)....
WARN [CompactionExecutor:744] 2020-12-11 03:10:57,437
BigTableWriter.java:211 - Writing large partition YYYY (151.155MiB)
WARN [CompactionExecutor:744] 2020-12-11 03:11:16,445
BigTableWriter.java:211 - Writing large partition ZZZZ (253.149MiB)
*Some more info:*
The *max_map_count* is set to 1048575, so all is well there.
Hugepages are enabled by default (I know I should disable them), but I
don't think it can cause this behaviour.
This never happened on v3.11.4, only on v3.11.8.
I'd really appreciate your help on this one.
Thanks!