batch_mutate() and insert() follow the a similar execution path to a single insert in the server. It's not like putting multiple statements in a Transaction in the RDBMS.
Where they do differ is that you can provide multiple columns for a row in a column family, and these will be applied as one operation including only one write to the commit log. However row you send requires a write to the commit log. What sort of data are you writing ? Are their multiple columns per row ? Another consideration is that each row becomes an mutation in the cluster. If a connection sends 1000's of rows at once all of it's mutations *could* momentarily fill all the available mutation workers on a node. This can slow down other clients connected to the cluster if they also need to write to that node. Watch the TPStats to see if the mutation pool has spikes in the pending range. You may want to reduce the batch size if clients are seeing high latency. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15 May 2011, at 10:34, Xiaowei Wang wrote: > Hi, > > We use Cassandra 0.7.4 to do TPC-C data loading on ec2 nodes. The loading > driver is written in pycassa. We test the loading speed on insert and > batch_insert, but it seems no significant difference. I know Cassandra first > write data to memory. But still confused why batch_insert does not quick than > single row insert. We only batch 2000 or 3000 rows a time.. > > Thanks for your help! > > Best, > Xiaowei