RE: Cassandra and G1 Garbage collector stop the world event (STW)

Steinmaurer, Thomas Mon, 09 Oct 2017 04:45:08 -0700

Hi,

although not happening here with Cassandra (due to using CMS), we had some 
weird problem with our server application e.g. hit by the following JVM/G1 bugs:
https://bugs.openjdk.java.net/browse/JDK-8140597
https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less  a duplicate of 
above)
https://bugs.openjdk.java.net/browse/JDK-8048556


Especially the first, JDK-8140597, might be interesting, if you see periodic 
humongous allocations (according to a GC log) resulting in mixed GC phases 
being steadily interrupted due to G1 bug, thus no GC in OLD regions. Humongous 
allocations will happen if a single (?) allocation is > (region size / 2), if I 
remember correctly. Can’t recall the default G1 region size for a 12GB heap, 
but possibly 4MB. So, in case you are allocating something larger than > 2MB, 
you might end up in something called “humongous” allocations, spanning several 
G1 regions. If this happens in a very short very frequently and depending on 
your allocation rate in MB/s, a combination of the G1 bug and a small heap, 
might result going towards OOM.

Possibly worth a further route for investigation.

Regards,
Thomas

From: Gustavo Scudeler [mailto:scudel...@gmail.com]
Sent: Montag, 09. Oktober 2017 13:12
To: user@cassandra.apache.org
Subject: Cassandra and G1 Garbage collector stop the world event (STW)


Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been 
dealing a lot with garbage collector stop the world event, which can take up to 
50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not 
even accepting new logins.

Extra details:
·         Cassandra Version: 3.11
·         Heap Size = 12 GB
·         We are using G1 Garbage Collector with default settings
·         Nodes size: 4 CPUs 28 GB RAM
·         All CPU cores are at 100% all the time.
·         The G1 GC behavior is the same across all nodes.

The behavior remains basically:
1.      Old Gen starts to fill up.
2.      GC can't clean it properly without a full GC and a STW event.
3.      The full GC starts to take longer, until the node is completely 
unresponsive.
Extra details and GC reports:
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

RE: Cassandra and G1 Garbage collector stop the world event (STW)

Reply via email to