FYI, my observations were with native, not thrift.
*.......* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Feb 19, 2016 at 10:12 AM, Sotirios Delimanolis <sotodel...@yahoo.com > wrote: > Does your cluster contain 24+ nodes or fewer? > > We did the same upgrade on a smaller cluster of 5 nodes and we didn't see > this behavior. On the 24 node cluster, the timeouts only took effect once > ~5-6-7+ nodes had been upgraded. > > We're doing some more upgrades next week, trying different deployment > plans. I'll report back with the results. > > Thanks for the reply (we absolutely want to move to CQL) > > > On Friday, February 19, 2016 1:10 AM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > > > I performed this exact update a few days ago, excepted clients were using > native protocol and it wen smoothly. So I think this might be thrift > related. No idea what is producing this though, just wanted to give the > info fwiw. > > As a side note, unrelated to the issue, performances using native are a > lot better than thrift starting in C* 2.1. Drivers using native are also > more modern allowing you to do very interesting stuff. Updating to native > now that you are using 2.1 is something you might want to do soon enough > :-). > > C*heers, > ----------------- > Alain Rodriguez > France > > The Last Pickle > http://www.thelastpickle.com > > 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis <sotodel...@yahoo.com>: > > We have a Cassandra cluster with 24 nodes. These nodes were running > 2.0.16. > > While the nodes are in the ring and handling queries, we perform the > upgrade to 2.1.12 as follows (more or less) one node at a time: > > > 1. Stop the Cassandra process > 2. Deploy jars, scripts, binaries, etc. > 3. Start the Cassandra process > > > A few nodes into the upgrade, we start noticing that the majority of > queries (mostly through Thrift) time out or report unavailable. Looking at > system information, Cassandra GC time goes through the roof, which is what > we assume causes the time outs. > > Once all nodes are upgraded, the cluster stabilizes and no more (barely > any) time outs occur. > > What could explain this? Does it have anything to do with how a 2.0 > communicates with a 2.1? > > Our Cassandra consumers haven't changed. > > > > > > > > >