The cluster is running into GC problems and this is slowing it down under the 
stress test. When it slows down one or more of the nodes is failing to perform 
the write within rpc_timeout . This causes the coordinator of the write to 
raise the TimedOutException. 

You options are:

* allocate more memory
* ease back on the stress test. 
* work as a CL QUORUM so that one node failing does result in the error. 

see also http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Cheers
 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/05/2012, at 12:59 PM, Jason Tang wrote:

> Hi
> 
> My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default 
> configuration (which means 1/3 heap for memtable), replicate number 3, write 
> all, read one.
> When I run stress load testing, I got this TimedOutException, and some 
> operation failed, and all traffic hang for a while. 
> 
> And when I have 1G memory 32 bit cassandra on standalone model, I didn't find 
> so frequently "Stop the world" behavior.
> 
> So I wonder what kind of operation will hang the cassandra system. 
> 
> How to collect information for tuning.
> 
> From the system log and document, I guess there are three type operations:
> 1) Flush memtable when meet max size
> 2) Compact SSTable (why?)
> 3) Java GC
> 
> system.log:
>  INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) 
> Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live 
> bytes, 2 ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) 
> Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) 
> Completed flushing 
> /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes)
> ...
> 
>  INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java 
> (line 112) Compacting 
> [SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
>  SSTableReader(path='/var/proclog/raw/cassandra/data/ myks 
> /queue-hb-32-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ 
> myks /queue-hb-37-Data.db'), 
> SSTableReader(path='/var/proclog/raw/cassandra/data/ myks 
> /queue-hb-53-Data.db')]
> ...
> 
>  WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) 
> Heap is 0.7993011015621736 full.  You may need to reduce memtable and/or 
> cache sizes.  Cassandra will now flush up to the two largest memtables to 
> free up memory.  Adjust flush_largest_memtables_at threshold in 
> cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) 
> GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 
> 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) 
> GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) 
> GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) 
> GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max 
> is 6274678784
> 
> 
> Timeout Exception:
> Caused by: org.apache.cassandra.thrift.TimedOutException: null
>         at 
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495)
>  ~[na:na]
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
>  ~[na:na]
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
>  ~[na:na]
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
>  ~[na:na]
>         ... 64 common frames omitted
> 
> BRs
> //Tang Weiqiang
> 
> 

Reply via email to