Hi My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default configuration (which means 1/3 heap for memtable), replicate number 3, write all, read one. When I run stress load testing, I got this TimedOutException, and some operation failed, and all traffic hang for a while.
And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently "Stop the world" behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. >From the system log and document, I guess there are three type operations: 1) Flush memtable when meet max size 2) Compact SSTable (why?) 3) Java GC system.log: INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) Completed flushing /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes) ... INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java (line 112) Compacting [SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-32-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-37-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-53-Data.db')] ... WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) Heap is 0.7993011015621736 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max is 6274678784 Timeout Exception: Caused by: org.apache.cassandra.thrift.TimedOutException: null at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) ~[na:na] ... 64 common frames omitted BRs //Tang Weiqiang