How are you doing your inserts?

I draw a clear line between 1) bootstrapping a cluster with data and 2) 
simulating expected/projected read/write behavior.

If you are bootstrapping then I would look into the batch_mutate APIs. They 
allow you to improve your performance on writes dramatically.

If you are read/write testing on a populated cluster, insert and batch_insert 
(for super columns) are the way to go.

As Ben has pointed to me in numerous threads ... think carefully about your 
replication factor. Do you want the data on all nodes? Or sufficiently 
replicated so that you can recover? Do you want consistency at the time of 
write? Or eventually?

Cassandra has a bunch of knobs that you can turn ... but that flexibility 
requires that you think about your expected usage patterns and operational 
policies.

-phil

On Jun 15, 2010, at 4:40 PM, Julie wrote:

> Benjamin Black <b <at> b3k.us> writes:
> 
>> 
>> You are likely exhausting your heap space (probably still at the very
>> small 1G default?), and maximizing the amount of resource consumption
>> by using CL.ALL.  Why are you using ALL?
>> 
>> On Tue, Jun 15, 2010 at 11:58 AM, Julie <julie.sugar <at> nextcentury.com> 
> wrote:
> ...
>>> Coinciding with my write timeouts, all 10 of my cassandra servers are 
> getting
>>> the following exception written to system.log:
>>> 
>>> 
>>>  INFO [FLUSH-WRITER-POOL:1] 2010-06-15 13:13:54,411 Memtable.java (line 162)
>>> Completed flushing /var/lib/cassandra/data/Keyspace1/Standard1-359-Data.db
>>> ERROR [MESSAGE-STREAMING-POOL:1] 2010-06-15 13:13:59,145
>>> DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor
>>> java.lang.RuntimeException: java.io.IOException: Value too large for defined
>>> data type
>>> at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>>> (ThreadPoolExecutor.java:886)
>>>        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.io.IOException: Value too large for defined data type
>>>        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>>>        at
>>> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
>>>        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
>>>        at
>>> org.apache.cassandra.net.FileStreamTask.stream(FileStreamTask.java:95)
>>>        at
>>> org.apache.cassandra.net.FileStreamTask.runMayThrow(FileStreamTask.java:63)
>>>        at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>        ... 3 more
> ...
> 
> 
> Thanks for your reply.  Yes, my heap space is 1G.  My vms have only 1.7G of 
> memory so I hesitate to use more.  I am using ALL because I was crashing 
> cassandra when I used ZERO (posting from a few days ago) with a heap space 
> error so it was recommended that I use ALL instead.  I also tried using ONE 
> but 
> got even more write timeouts so I thought it would be safer to just wait for 
> ALL replications to be written before trying to write more rows. 
> 
> Thank you for your help.
> 
> 
> 

Reply via email to