We had similar latency spikes when pending compactions can't keep it up or
repair/streaming taking too much cycles.


On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng <fnt...@gmail.com> wrote:

> All,
>
> We've been having intermittent long application pauses (version 1.2.8) and
> not sure if it's a cassandra bug.  During these pauses, there are dropped
> messages in the cassandra log file along with the node seeing other nodes
> as down.  We've turned on gc logging and the following is an example of a
> long "stopped" or pause event in the gc.log file.
>
> 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
> application threads were stopped: 0.091450 seconds
> 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
> application threads were stopped: 51.8190260 seconds
> 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
> application threads were stopped: 0.005470 seconds
>
> As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
> pause.  There were no GC log events between those 2 log statements.  Since
> there's no GC logs in between, something else must be causing the long stop
> time to reach a safepoint.
>
> Could there be a Cassandra thread that is taking a long time to reach a
> safepoint and what is it trying to do? Along with the node seeing other
> nodes as down in the cassandra log file, the StatusLogger shows 1599
> Pending in ReadStage and 9 Pending in MutationStage.
>
> There is mention of cassandra batch revoke bias locks as a possible cause
> (not GC) via:
> http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html
>
> We have JNA, no swap, and the cluster runs fine besides there intermittent
> long pause that can cause a node to appear down to other nodes.  Any ideas
> as the cause of the long pause above? It seems not related to GC.
>
> thanks.
>
>

Reply via email to