Re: insert and batch_insert

aaron morton Mon, 16 May 2011 02:02:11 -0700

batch_mutate() and insert() follow the a similar execution path to a single 
insert in the server. It's not like putting multiple statements in a 
Transaction in the RDBMS.

Where they do differ is that you can provide multiple columns for a row in a 
column family, and these will be applied as one operation including only one 
write to the commit log. However row you send requires a write to the commit 
log.

What sort of data are you writing ? Are their multiple columns per row ? 

Another consideration is that each row becomes an mutation in the cluster. If a 
connection sends 1000's of rows at once all of it's mutations *could* 
momentarily fill all the available mutation workers on a node. This can slow 
down other clients connected to the cluster if they also need to write to that 
node. Watch the TPStats to see if the mutation pool has spikes in the pending 
range. You may want to reduce the batch size if clients are seeing high 
latency. 

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 May 2011, at 10:34, Xiaowei Wang wrote:

> Hi,
> 
> We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading 
> driver is written in pycassa. We test the loading speed on insert and 
> batch_insert, but it seems no significant difference. I know Cassandra first 
> write data to memory. But still confused why batch_insert does not quick than 
> single row insert. We only batch 2000 or 3000 rows a time..
> 
> Thanks for your help!
> 
> Best,
> Xiaowei

Re: insert and batch_insert

Reply via email to