On Mon, Sep 19, 2011 at 02:42:42PM +0100, Jaco Breitenbach wrote:
> Hi Simon,
> 
> Thanks for the reply.
> 
> On 19 September 2011 13:23, Simon Slavin <slav...@bigfraud.org> wrote:
> 
> > > I run the database (3.7.7.1) in WAL mode, with checkpointing performed at
> > 10
> > > minute intervals.
> >
> > You turned 'PRAGMA wal_autocheckpoint' off, right ?
> >
> Correct.  The frequency of checkpointing seems to have a significant impact
> on overall performance.  I've disabled auto-checkpointing and introduced a
> time-based manual checkpoint frequency similar to that of TimesTen.  In my
> tests I've set the interval to 10 minutes.
> 
> If you know you are going to do lots of INSERTs in a row you can put
> > BEGIN/END around them.  This will dramatically speed up the operation.  On
> > the other hand, if your benchmark is simulating lots of separate logging
> > entries you will not want to do this.
> >
> That is indeed how my application works.  Records are processed in
> transactions of approximately 20,000 records each.
> 
> > <http://www.sqlite.org/draft/pragma.html#pragma_journal_mode>
> >
> > If, on the other hand you aren't concerned about using too much space, and
> > are simply exploring to find out how things are likely to work in real life,
> > carry on.
> >
> My application is rather simple.  It simply inserts random 22-character
> strings into a single-column, unique-indexed table as fast as possible.  The
> main concern is application performance and I've spent many hours optimising
> as much as possible.  In terms of hardware, the resource I am most concerned
> about is I/O bandwidth and I am trying to estimate the I/O requirement as a
> function of record size and record processing rate.


Is this the actual application, or just some test application that is
representative of the final application?

I ask because I can't see random 22-character string inserts as representative
of anything other than worst case performance, so the test might not actually
be valid.

You'd be better off testing with real data with real insert patterns, then
evaluating the IO demands of that. Your test above will result in lots of
random IO, which is the worst case for spinning disk media.

For insert mostly updates in big batches, WAL mode might not be optimal.
The data will be written twice, once to the WAL, once to the database file
whereas in rollback journal mode, most data will be written just once (with
smaller amounts of rollback data written to the journal.)

Once you introduce updates and deletes, the performance patterns can change
as more updates become in place updates, which might favour WAL mode more.

Either way, the IO requirements are not likely to reflect record size and
processing rate per se. You're more likely to be limited to transaction
rate, transaction rate being limited by the cache flush rate of your
IO stack and bound in the spinning platter case to the write latency
of the disk. Such limits are in the order of 10s of transactions per second
max for non-NVRAM cached spinning disks. 

Of course the first rule of programming is "make it work", followed 
afterwards by "make it work fast". And well designed schemas and indexes
are likely to have more impact on performance (orders of magnitude
potentially) than the micro-benchmark based optimizations you're
looking at at the moment.

Hope that helps,
Christian
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to