Hi, I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort because of major performance issues associated with GC pauses.
Details: 3 node cluster, RF 3, 1 DC ~2TB data per node Heap Size: 12G / New Size: 5G I didn't even get very far in the upgrade - I just upgraded a binary of a single node to 3.11.6 (did not run upgradesstables) and let it sit. Within 10 minutes, I started seeing elevated GC pressure and lots of timeouts in the metrics. All three nodes, not just the upgraded one, are seeing GC problems. GC par new time jumped from .38 up to 3%. CMS times up to 30 seconds. Once I turn off node on 3.11.6, the cluster eventually recovers. Can anyone point me to ways to debug this? I've taken heap dumps of all nodes but nothing in particular stands out, and there are no obvious messages in the logs that point to problems.