Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

Sotirios Delimanolis Thu, 18 Feb 2016 18:08:20 -0800

We have a Cassandra cluster with 24 nodes. These nodes were running 2.0.16. 
While the nodes are in the ring and handling queries, we perform the upgrade to 
2.1.12 as follows (more or less) one node at a time:
   
   - Stop the Cassandra process
   - Deploy jars, scripts, binaries, etc.
   - Start the Cassandra process


A few nodes into the upgrade, we start noticing that the majority of queries 
(mostly through Thrift) time out or report unavailable. Looking at system 
information, Cassandra GC time goes through the roof, which is what we assume 
causes the time outs.
Once all nodes are upgraded, the cluster stabilizes and no more (barely any) 
time outs occur. 
What could explain this? Does it have anything to do with how a 2.0 
communicates with a 2.1?
Our Cassandra consumers haven't changed.

Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

Reply via email to