May be unrelated, but I found highly variable latency (latency max) when on the 2.1 code tree loading new data (and reading). Others found that G1 or CMS do not make a difference. Some evidence that 8/12/16gb memory make no difference. These were latencies in the 10-30 SECOND range. It did cause timeouts. You may not be seeing a 2.0 vs. 2.1 issue, rather a 2.1 issue proper. While others did not find this associated with stop-the-world GC, I saw some evidence of same (using Cassandra stress, but I recently reproduce the issue with YCSB!)
*.......* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Feb 19, 2016 at 1:10 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > I performed this exact update a few days ago, excepted clients were using > native protocol and it wen smoothly. So I think this might be thrift > related. No idea what is producing this though, just wanted to give the > info fwiw. > > As a side note, unrelated to the issue, performances using native are a > lot better than thrift starting in C* 2.1. Drivers using native are also > more modern allowing you to do very interesting stuff. Updating to native > now that you are using 2.1 is something you might want to do soon enough > :-). > > C*heers, > ----------------- > Alain Rodriguez > France > > The Last Pickle > http://www.thelastpickle.com > > 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis <sotodel...@yahoo.com>: > >> We have a Cassandra cluster with 24 nodes. These nodes were running >> 2.0.16. >> >> While the nodes are in the ring and handling queries, we perform the >> upgrade to 2.1.12 as follows (more or less) one node at a time: >> >> >> 1. Stop the Cassandra process >> 2. Deploy jars, scripts, binaries, etc. >> 3. Start the Cassandra process >> >> >> A few nodes into the upgrade, we start noticing that the majority of >> queries (mostly through Thrift) time out or report unavailable. Looking at >> system information, Cassandra GC time goes through the roof, which is what >> we assume causes the time outs. >> >> Once all nodes are upgraded, the cluster stabilizes and no more (barely >> any) time outs occur. >> >> What could explain this? Does it have anything to do with how a 2.0 >> communicates with a 2.1? >> >> Our Cassandra consumers haven't changed. >> >> >> >> >> >> >