We have a Cassandra cluster with 24 nodes. These nodes were running 2.0.16. 
While the nodes are in the ring and handling queries, we perform the upgrade to 
2.1.12 as follows (more or less) one node at a time:
   
   - Stop the Cassandra process
   - Deploy jars, scripts, binaries, etc.
   - Start the Cassandra process

A few nodes into the upgrade, we start noticing that the majority of queries 
(mostly through Thrift) time out or report unavailable. Looking at system 
information, Cassandra GC time goes through the roof, which is what we assume 
causes the time outs.
Once all nodes are upgraded, the cluster stabilizes and no more (barely any) 
time outs occur. 
What could explain this? Does it have anything to do with how a 2.0 
communicates with a 2.1?
Our Cassandra consumers haven't changed.




Reply via email to