On Fri, Jul 6, 2012 at 9:44 AM, rohit bhatia <rohit2...@gmail.com> wrote: > On Fri, Jul 6, 2012 at 4:47 AM, aaron morton <aa...@thelastpickle.com> wrote: >> 12G Heap, >> 1600Mb Young gen, >> >> Is a bit higher than the normal recommendation. 1600MB young gen can cause >> some extra ParNew pauses. > Thanks for heads up, i'll try tinkering on this > >> >> 128 Concurrent writer >> threads >> >> Unless you are on SSD this is too many. >> > I mean > http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes > , this is not memtable flush queue writers. > Suggested value is 8*number of cores(16) = 128 itself. >> >> 1) Is using JDK 1.7 any way detrimental to cassandra? >> >> as far as I know it's not fully certified, thanks for trying it :) >> >> 2) What is the max write operation qps that should be expected. Is the >> netflix benchmark also applicable for counter incrmenting tasks? >> >> Counters use a different write path than normal writes and are a bit slower. >> >> To benchmark, get a single node and work out the max throughput. Then >> multiply by the number of nodes and divide by the RF to get a rough idea. >> >> the cpu >> idle time is around 30%, cassandra is not disk bound(insignificant >> read operations and cpu's iowait is around 0.05%) >> >> Wait until compaction kicks in and handle all your inserts. >> >> The os load is around 16-20 and the average write latency is 3ms. >> tpstats do not show any significant pending tasks. >> >> The node is overloaded. What is the write latency for a single thread doing >> as single increment against a node that has not other traffic ? The latency >> for a request is the time spent working and the time spent waiting, once you >> read the max throughput the time spent waiting increases. The SEDA >> architecture is designed to limit the time spent working. The write latency I reported is as reported by datastax opscenter for the total latency of a client's request. This is minimum at .5ms. In contrast, the "local write request latency" as reported by cfstats are around 50 micro seconds but jump to 150 microseconds during the crash.
>> >> At this point suddenly, Several nodes start dropping several >> "Mutation" messages. There are also lots of pending >> >> The cluster is overwhelmed. >> >> Almost all the new threads seem to be named >> "pool-2-thread-*". >> >> These are client connection threads. >> >> My guess is that this might be due to the 128 Writer threads not being >> able to perform more writes.( >> >> Yes. >> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214 >> >> Work out the latency for a single client single node, then start adding >> replication, nodes and load. When the latency increases you are getting to >> the max throughput for that config. > > Also, as mentioned in my second mail, seeing messages like this "Total > time for which application threads were stopped: 16.7663710 seconds", > if something pauses for this long, it might be overwhelmed by the > hints stored at other nodes. This can further cause the node to wait > on/drop a lot of client connection threads. I'll look into what is > causing these non-gc pauses. Thanks for the help. > >> >> Hope that helps >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 5/07/2012, at 6:49 PM, rohit bhatia wrote: >> >> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap, >> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer >> threads). The replication factor is 2 with 10 column families and we >> service Counter incrementing write intensive tasks(CL=ONE). >> >> I am trying to figure out the bottleneck, >> >> 1) Is using JDK 1.7 any way detrimental to cassandra? >> >> 2) What is the max write operation qps that should be expected. Is the >> netflix benchmark also applicable for counter incrmenting tasks? >> >> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html >> >> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu >> idle time is around 30%, cassandra is not disk bound(insignificant >> read operations and cpu's iowait is around 0.05%) and is not swapping >> its memory(around 15 gb RAM is free or inactive). The average gc pause >> time for parnew are 100ms occuring every second. So cassandra spends >> 10% of its time stuck in "Stop the world" collector. >> The os load is around 16-20 and the average write latency is 3ms. >> tpstats do not show any significant pending tasks. >> >> At this point suddenly, Several nodes start dropping several >> "Mutation" messages. There are also lots of pending >> MutationStage,replicateOnWriteStage tasks in tpstats. >> The number of threads in the java process increase to around 25,000 >> from the usual 300-400. Almost all the new threads seem to be named >> "pool-2-thread-*". >> The OS load jumps to around 30-40, the "write request latency" starts >> spiking to more than 500ms (even to several tens of seconds sometime). >> Even the "Local write latency" increases fourfolds to 200 microseconds >> from 50 microseconds. This happens across all the nodes and in around >> 2-3 minutes. >> My guess is that this might be due to the 128 Writer threads not being >> able to perform more writes.(though with average local write latency >> of 100-150 micro seconds, each thread should be able to serve 10,000 >> qps and with 128 writer threads, should be able to serve 1,280,000 qps >> per node) >> Could there be any other reason for this? What else should I monitor >> since system.log do not seem to say anything conclusive before >> dropping messages. >> >> >> >> Thanks >> Rohit >> >>