Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

daemeon reiydelle Fri, 19 Feb 2016 08:24:23 -0800

May be unrelated, but I found highly variable latency (latency max) when on
the 2.1 code tree loading new data (and reading). Others found that G1 or
CMS do not make a difference. Some evidence that 8/12/16gb memory make no
difference. These were latencies in the 10-30 SECOND range. It did cause
timeouts. You may not be seeing a 2.0 vs. 2.1 issue, rather a 2.1 issue
proper. While others did not find this associated with stop-the-world GC, I
saw some evidence of same (using Cassandra stress, but I recently reproduce
the issue with YCSB!)



*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Feb 19, 2016 at 1:10 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> I performed this exact update a few days ago, excepted clients were using
> native protocol and it wen smoothly. So I think this might be thrift
> related. No idea what is producing this though, just wanted to give the
> info fwiw.
>
> As a side note, unrelated to the issue, performances using native are a
> lot better than thrift starting in C* 2.1. Drivers using native are also
> more modern allowing you to do very interesting stuff. Updating to native
> now that you are using 2.1 is something you might want to do soon enough
> :-).
>
> C*heers,
> -----------------
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis <sotodel...@yahoo.com>:
>
>> We have a Cassandra cluster with 24 nodes. These nodes were running
>> 2.0.16.
>>
>> While the nodes are in the ring and handling queries, we perform the
>> upgrade to 2.1.12 as follows (more or less) one node at a time:
>>
>>
>>    1. Stop the Cassandra process
>>    2. Deploy jars, scripts, binaries, etc.
>>    3. Start the Cassandra process
>>
>>
>> A few nodes into the upgrade, we start noticing that the majority of
>> queries (mostly through Thrift) time out or report unavailable. Looking at
>> system information, Cassandra GC time goes through the roof, which is what
>> we assume causes the time outs.
>>
>> Once all nodes are upgraded, the cluster stabilizes and no more (barely
>> any) time outs occur.
>>
>> What could explain this? Does it have anything to do with how a 2.0
>> communicates with a 2.1?
>>
>> Our Cassandra consumers haven't changed.
>>
>>
>>
>>
>>
>>
>

Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

Reply via email to