On Mon, Sep 19, 2011 at 02:42:42PM +0100, Jaco Breitenbach wrote: > Hi Simon, > > Thanks for the reply. > > On 19 September 2011 13:23, Simon Slavin <slav...@bigfraud.org> wrote: > > > > I run the database (3.7.7.1) in WAL mode, with checkpointing performed at > > 10 > > > minute intervals. > > > > You turned 'PRAGMA wal_autocheckpoint' off, right ? > > > Correct. The frequency of checkpointing seems to have a significant impact > on overall performance. I've disabled auto-checkpointing and introduced a > time-based manual checkpoint frequency similar to that of TimesTen. In my > tests I've set the interval to 10 minutes. > > If you know you are going to do lots of INSERTs in a row you can put > > BEGIN/END around them. This will dramatically speed up the operation. On > > the other hand, if your benchmark is simulating lots of separate logging > > entries you will not want to do this. > > > That is indeed how my application works. Records are processed in > transactions of approximately 20,000 records each. > > > <http://www.sqlite.org/draft/pragma.html#pragma_journal_mode> > > > > If, on the other hand you aren't concerned about using too much space, and > > are simply exploring to find out how things are likely to work in real life, > > carry on. > > > My application is rather simple. It simply inserts random 22-character > strings into a single-column, unique-indexed table as fast as possible. The > main concern is application performance and I've spent many hours optimising > as much as possible. In terms of hardware, the resource I am most concerned > about is I/O bandwidth and I am trying to estimate the I/O requirement as a > function of record size and record processing rate.
Is this the actual application, or just some test application that is representative of the final application? I ask because I can't see random 22-character string inserts as representative of anything other than worst case performance, so the test might not actually be valid. You'd be better off testing with real data with real insert patterns, then evaluating the IO demands of that. Your test above will result in lots of random IO, which is the worst case for spinning disk media. For insert mostly updates in big batches, WAL mode might not be optimal. The data will be written twice, once to the WAL, once to the database file whereas in rollback journal mode, most data will be written just once (with smaller amounts of rollback data written to the journal.) Once you introduce updates and deletes, the performance patterns can change as more updates become in place updates, which might favour WAL mode more. Either way, the IO requirements are not likely to reflect record size and processing rate per se. You're more likely to be limited to transaction rate, transaction rate being limited by the cache flush rate of your IO stack and bound in the spinning platter case to the write latency of the disk. Such limits are in the order of 10s of transactions per second max for non-NVRAM cached spinning disks. Of course the first rule of programming is "make it work", followed afterwards by "make it work fast". And well designed schemas and indexes are likely to have more impact on performance (orders of magnitude potentially) than the micro-benchmark based optimizations you're looking at at the moment. Hope that helps, Christian _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users