Finding bottleneck of a cluster

rohit bhatia Wed, 04 Jul 2012 23:49:56 -0700

Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
threads). The replication factor is 2 with 10 column families and we
service Counter incrementing write intensive tasks(CL=ONE).


I am trying to figure out the bottleneck,

1) Is using JDK 1.7 any way detrimental to cassandra?

2) What is the max write operation qps that should be expected. Is the
netflix benchmark also applicable for counter incrmenting tasks?
    
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
idle time is around 30%, cassandra is not disk bound(insignificant
read operations and cpu's iowait is around 0.05%) and is not swapping
its memory(around 15 gb RAM is free or inactive). The average gc pause
time for parnew are 100ms occuring every second. So cassandra spends
10% of its time stuck in "Stop the world" collector.
The os load is around 16-20 and the average write latency is 3ms.
tpstats do not show any significant pending tasks.

    At this point suddenly, Several nodes start dropping several
"Mutation" messages. There are also lots of pending
MutationStage,replicateOnWriteStage tasks in tpstats.
The number of threads in the java process increase to around 25,000
from the usual 300-400. Almost all the new threads seem to be named
"pool-2-thread-*".
The OS load jumps to around 30-40, the "write request latency" starts
spiking to more than 500ms (even to several tens of seconds sometime).
Even the "Local write latency" increases fourfolds to 200 microseconds
from 50 microseconds. This happens across all the nodes and in around
2-3 minutes.
My guess is that this might be due to the 128 Writer threads not being
able to perform more writes.(though with  average local write latency
of 100-150 micro seconds, each thread should be able to serve 10,000
qps and with 128 writer threads, should be able to serve 1,280,000 qps
per node)
Could there be any other reason for this? What else should I monitor
since system.log do not seem to say anything conclusive before
dropping messages.



Thanks
Rohit

Finding bottleneck of a cluster

Reply via email to