I wrote some scripts to test this: https://github.com/davidtinker/cassandra-perf
3 node cluster, each node: Intel® Xeon® E3-1270 v3 Quadcore Haswell 32GB RAM, 1 x 2TB commit log disk, 2 x 4TB data disks (RAID0) Using a batch of prepared statements is about 5% faster than inline parameters: InsertBatchOfPreparedStatements: Inserted 2551704 rows in 100000 batches using 256 concurrent operations in 15.785 secs, 161653 rows/s, 6335 batches/s InsertInlineBatch: Inserted 2551704 rows in 100000 batches using 256 concurrent operations in 16.712 secs, 152686 rows/s, 5983 batches/s On Wed, Dec 11, 2013 at 2:40 PM, Sylvain Lebresne <sylv...@datastax.com> wrote: > Then I suspect that this is artifact of your test methodology. Prepared > statements *are* faster than non prepared ones in general. They save some > parsing and some bytes on the wire. The savings will tend to be bigger for > bigger queries, and it's possible that for very small queries (like the one > you > are testing) the performance difference is somewhat negligible, but seeing > non > prepared statement being significantly faster than prepared ones almost > surely > means you're doing wrong (of course, a bug in either the driver or C* is > always > possible, and always make sure to test recent versions, but I'm not aware of > any such bug). > > Are you sure you are warming up the JVMs (client and drivers) properly for > instance. 1000 iterations is *really small*, if you're not warming things > up properly, you're not measuring anything relevant. Also, are you including > the preparation of the query itself in the timing? Preparing a query is not > particulary fast, but it's meant to be done just once at the begining of the > application lifetime. But with only 1000 iterations, if you include the > preparation in the timing, it's entirely possible it's eating a good chunk > of > the whole time. > > But other prepared versus non-prepared, you won't get proper performance > unless > you parallelize your inserts. Unlogged batches is one way to do it (it's > really > all Cassandra does with unlogged batch, parallelizing). But as John Sanda > mentioned, another option is to do the parallelization client side, with > executeAsync. > > -- > Sylvain > > > > On Wed, Dec 11, 2013 at 11:37 AM, David Tinker <david.tin...@gmail.com> > wrote: >> >> Yes thats what I found. >> >> This is faster: >> >> for (int i = 0; i < 1000; i++) session.execute("INSERT INTO >> test.wibble (id, info) VALUES ('${"" + i}', '${"aa" + i}')") >> >> Than this: >> >> def ps = session.prepare("INSERT INTO test.wibble (id, info) VALUES (?, >> ?)") >> for (int i = 0; i < 1000; i++) session.execute(ps.bind(["" + i, "aa" + >> i] as Object[])) >> >> This is the fastest option of all (hand rolled batch): >> >> StringBuilder b = new StringBuilder() >> b.append("BEGIN UNLOGGED BATCH\n") >> for (int i = 0; i < 1000; i++) { >> b.append("INSERT INTO ").append(ks).append(".wibble (id, info) >> VALUES ('").append(i).append("','") >> .append("aa").append(i).append("')\n") >> } >> b.append("APPLY BATCH\n") >> session.execute(b.toString()) >> >> >> On Wed, Dec 11, 2013 at 10:56 AM, Sylvain Lebresne <sylv...@datastax.com> >> wrote: >> > >> >> This loop takes 2500ms or so on my test cluster: >> >> >> >> PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble >> >> (id, info) VALUES (?, ?)") >> >> for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" + >> >> i)); >> >> >> >> The same loop with the parameters inline is about 1300ms. It gets >> >> worse if there are many parameters. >> > >> > >> > Do you mean that: >> > for (int i = 0; i < 1000; i++) >> > session.execute("INSERT INTO perf_test.wibble (id, info) VALUES (" >> > + i >> > + ", aa" + i + ")"); >> > is twice as fast as using a prepared statement? And that the difference >> > is even greater if you add more columns than "id" and "info"? >> > >> > That would certainly be unexpected, are you sure you're not re-preparing >> > the >> > statement every time in the loop? >> > >> > -- >> > Sylvain >> > >> >> I know I can use batching to >> >> insert all the rows at once but thats not the purpose of this test. I >> >> also tried using session.execute(cql, params) and it is faster but >> >> still doesn't match inline values. >> >> >> >> Composing CQL strings is certainly convenient and simple but is there >> >> a much faster way? >> >> >> >> Thanks >> >> David >> >> >> >> I have also posted this on Stackoverflow if anyone wants the points: >> >> >> >> >> >> http://stackoverflow.com/questions/20491090/what-is-the-fastest-way-to-get-data-into-cassandra-2-from-a-java-application >> > >> > >> >> >> >> -- >> http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ >> Integration > > -- http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration