Can you turn the logging up to DEBUG level and look for a message from CassandraServer that says "... timed out" ?
Also check the thread pool stats "nodetool tpstats" to see if the node is keeping up. Aaron On 7 Apr 2011, at 13:43, Sheng Chen wrote: > Thank you Aaron. > > It does not seem to be an overload problem. > > I have 16 cores and 48G ram on the single node, and I reduced the concurrent > threads to be 1. > Still, it just suddenly dies of a timeout, while the cpu, ram, disk load are > below 10% and write latency is about 0.5ms for the past 10 minutes which is > really fast. > > No logs of dropped messages are found. > > > > > > 2011/4/7 aaron morton <aa...@thelastpickle.com> > TimedOutException means that the less than CL number of nodes responded to > the coordinator before the rpc_timeout. > > So it was overloaded. Which makes sense when you say it only happens with > secondary indexes. Consider things like > - reducing the throughput > - reducing the number of clients > - ensuring the clients are connecting to all nodes in the cluster. > > You will probably find some logs about dropped messages on some nodes. > Aaron > > On 6 Apr 2011, at 20:39, Sheng Chen wrote: > > > I used py_stress module to insert 10m test data with a secondary index. > > I got the following exceptions. > > > > # python stress.py -d xxx -o insert -n 10000000 -c 5 -s 34 -C 5 -x keys > > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time > > 265322,26532,26541,0.00186140829433,10 > > 630300,36497,36502,0.00129331431204,20 > > 986781,35648,35640,0.0013310986218,30 > > 1332190,34540,34534,0.00135942295893,40 > > 1473578,14138,14138,0.00142941070007,50 > > Process Inserter-38: > > Traceback (most recent call last): > > File "/usr/lib64/python2.4/site-packages/multiprocessing/process.py", > > line 237, in _bootstrap > > self.run() > > File "stress.py", line 242, in run > > self.cclient.batch_mutate(cfmap, consistency) > > File > > "/root/apache-cassandra-0.7.4-src/interface/thrift/gen-py/cassandra/Cassandra.py", > > line 784, in batch_mutate > > TimedOutException: TimedOutException(args=()) > > self.run() > > File "stress.py", line 242, in run > > self.recv_batch_mutate() > > File > > "/root/apache-cassandra-0.7.4-src/interface/thrift/gen-py/cassandra/Cassandra.py", > > line 810, in recv_batch_mutate > > raise result.te > > > > > > Tests without secondary index is ok at about 40k ops/sec. > > > > There is a `GC for ParNew` for about 200ms taking place every second. Does > > it matter? > > The same gc for about 400ms happens every 2 seconds, which does not hurt > > the inserts without secondary index. > > > > Thanks in advance for any advice. > > > > Sheng > >