Hi, could it be due to having noisy neighbour? Do you have graphs
statistics ping between nodes?

Jason


On Mon, Jan 6, 2014 at 7:28 AM, Blake Eggleston <bl...@shift.com> wrote:

> Hi,
>
> I’ve been having a problem with 3 neighboring nodes in our cluster having
> their read latencies jump up to 9000ms - 18000ms for a few minutes (as
> reported by opscenter), then come back down.
>
> We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with
> cassandra reading and writing to 2 raided ssds.
>
> I’ve added 2 nodes to the struggling part of the cluster, and aside from
> the latency spikes shifting onto the new nodes, it has had no effect. I
> suspect that a single key that lives on the first stressed node may be
> being read from heavily.
>
> The spikes in latency don’t seem to be correlated to an increase in reads.
> The cluster’s workload is usually handling a maximum workload of 4200
> reads/sec per node, with writes being significantly less, at ~200/sec per
> node. Usually it will be fine with this, with read latencies at around
> 3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will
> shoot through the roof.
>
> The disks aren’t showing serious use, with read and write rates on the ssd
> volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra
> process is maintaining 1000-1100 open connections. GC logs aren’t showing
> any serious gc pauses.
>
> Any ideas on what might be causing this?
>
> Thanks,
>
> Blake

Reply via email to