Hi,

I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort
because of major performance issues associated with GC pauses.

Details:
3 node cluster, RF 3, 1 DC
~2TB data per node
Heap Size: 12G / New Size: 5G

I didn't even get very far in the upgrade - I just upgraded a binary of a
single node to 3.11.6 (did not run upgradesstables) and let it sit.  Within
10 minutes, I started seeing elevated GC pressure and lots of timeouts in
the metrics.

All three nodes, not just the upgraded one, are seeing GC problems.
GC par new time jumped from .38 up to 3%.  CMS times up to 30 seconds.

Once I turn off node on 3.11.6, the cluster eventually recovers.

Can anyone point me to ways to debug this?  I've taken heap dumps of all
nodes but nothing in particular stands out, and there are no
obvious messages in the logs that point to problems.

Reply via email to