We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles.
On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng <fnt...@gmail.com> wrote: > All, > > We've been having intermittent long application pauses (version 1.2.8) and > not sure if it's a cassandra bug. During these pauses, there are dropped > messages in the cassandra log file along with the node seeing other nodes > as down. We've turned on gc logging and the following is an example of a > long "stopped" or pause event in the gc.log file. > > 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which > application threads were stopped: 0.091450 seconds > 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which > application threads were stopped: 51.8190260 seconds > 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which > application threads were stopped: 0.005470 seconds > > As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs > pause. There were no GC log events between those 2 log statements. Since > there's no GC logs in between, something else must be causing the long stop > time to reach a safepoint. > > Could there be a Cassandra thread that is taking a long time to reach a > safepoint and what is it trying to do? Along with the node seeing other > nodes as down in the cassandra log file, the StatusLogger shows 1599 > Pending in ReadStage and 9 Pending in MutationStage. > > There is mention of cassandra batch revoke bias locks as a possible cause > (not GC) via: > http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html > > We have JNA, no swap, and the cluster runs fine besides there intermittent > long pause that can cause a node to appear down to other nodes. Any ideas > as the cause of the long pause above? It seems not related to GC. > > thanks. > >