Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap, 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer threads). The replication factor is 2 with 10 column families and we service Counter incrementing write intensive tasks(CL=ONE).
I am trying to figure out the bottleneck, 1) Is using JDK 1.7 any way detrimental to cassandra? 2) What is the max write operation qps that should be expected. Is the netflix benchmark also applicable for counter incrmenting tasks? http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu idle time is around 30%, cassandra is not disk bound(insignificant read operations and cpu's iowait is around 0.05%) and is not swapping its memory(around 15 gb RAM is free or inactive). The average gc pause time for parnew are 100ms occuring every second. So cassandra spends 10% of its time stuck in "Stop the world" collector. The os load is around 16-20 and the average write latency is 3ms. tpstats do not show any significant pending tasks. At this point suddenly, Several nodes start dropping several "Mutation" messages. There are also lots of pending MutationStage,replicateOnWriteStage tasks in tpstats. The number of threads in the java process increase to around 25,000 from the usual 300-400. Almost all the new threads seem to be named "pool-2-thread-*". The OS load jumps to around 30-40, the "write request latency" starts spiking to more than 500ms (even to several tens of seconds sometime). Even the "Local write latency" increases fourfolds to 200 microseconds from 50 microseconds. This happens across all the nodes and in around 2-3 minutes. My guess is that this might be due to the 128 Writer threads not being able to perform more writes.(though with average local write latency of 100-150 micro seconds, each thread should be able to serve 10,000 qps and with 128 writer threads, should be able to serve 1,280,000 qps per node) Could there be any other reason for this? What else should I monitor since system.log do not seem to say anything conclusive before dropping messages. Thanks Rohit