Re: [sqlite] SQLite disc and I/O usage

Jaco Breitenbach Mon, 19 Sep 2011 11:13:57 -0700

Hi Christian,

Thanks for your response.


On 19 September 2011 16:45, Christian Smith <csm...@thewrongchristian.org.uk
> wrote:

> On Mon, Sep 19, 2011 at 02:42:42PM +0100, Jaco Breitenbach wrote:
> > Hi Simon,
> >
> > Thanks for the reply.
> >
> > On 19 September 2011 13:23, Simon Slavin <slav...@bigfraud.org> wrote:
> >
> > > > I run the database (3.7.7.1) in WAL mode, with checkpointing
> performed at
> > > 10
> > > > minute intervals.
> > >
> > > You turned 'PRAGMA wal_autocheckpoint' off, right ?
> > >
> > Correct.  The frequency of checkpointing seems to have a significant
> impact
> > on overall performance.  I've disabled auto-checkpointing and introduced
> a
> > time-based manual checkpoint frequency similar to that of TimesTen.  In
> my
> > tests I've set the interval to 10 minutes.
> >
> > If you know you are going to do lots of INSERTs in a row you can put
> > > BEGIN/END around them.  This will dramatically speed up the operation.
>  On
> > > the other hand, if your benchmark is simulating lots of separate
> logging
> > > entries you will not want to do this.
> > >
> > That is indeed how my application works.  Records are processed in
> > transactions of approximately 20,000 records each.
> >
> > > <http://www.sqlite.org/draft/pragma.html#pragma_journal_mode>
> > >
> > > If, on the other hand you aren't concerned about using too much space,
> and
> > > are simply exploring to find out how things are likely to work in real
> life,
> > > carry on.
> > >
> > My application is rather simple.  It simply inserts random 22-character
> > strings into a single-column, unique-indexed table as fast as possible.
>  The
> > main concern is application performance and I've spent many hours
> optimising
> > as much as possible.  In terms of hardware, the resource I am most
> concerned
> > about is I/O bandwidth and I am trying to estimate the I/O requirement as
> a
> > function of record size and record processing rate.
>
>
> Is this the actual application, or just some test application that is
> representative of the final application?


> I ask because I can't see random 22-character string inserts as
> representative
> of anything other than worst case performance, so the test might not
> actually
> be valid.
>
> The test was performed with the actual application.  The processing rates
I've mentioned initially weren't that important at that point.  Actual
performance varies greatly depending on the hardware platform.

The purpose of my application is to check input records for duplicates.
This is done by calculating a 22-character string that uniquely identifies
an input record.  This key is then inserted into the unique-indexed table.
If the insert fails because of a unique constraint violation, the record is
flagged as duplicate and filtered, otherwise the insert succeeds and the
input record is processed further.  My statement about inserting random
strings wasn't therefor not entirely correct, but a fair approximation of
reality.

You'd be better off testing with real data with real insert patterns, then
> evaluating the IO demands of that. Your test above will result in lots of
> random IO, which is the worst case for spinning disk media.
>
> For insert mostly updates in big batches, WAL mode might not be optimal.
> The data will be written twice, once to the WAL, once to the database file
> whereas in rollback journal mode, most data will be written just once (with
> smaller amounts of rollback data written to the journal.)
>
> Once you introduce updates and deletes, the performance patterns can change
> as more updates become in place updates, which might favour WAL mode more.
>
> There will never be any updates or deletes; only inserts.


> Either way, the IO requirements are not likely to reflect record size and
> processing rate per se. You're more likely to be limited to transaction
> rate, transaction rate being limited by the cache flush rate of your
> IO stack and bound in the spinning platter case to the write latency
> of the disk. Such limits are in the order of 10s of transactions per second
> max for non-NVRAM cached spinning disks.
>
> Of course the first rule of programming is "make it work", followed
> afterwards by "make it work fast". And well designed schemas and indexes
> are likely to have more impact on performance (orders of magnitude
> potentially) than the micro-benchmark based optimizations you're
> looking at at the moment.
>
> Some of the things I have experimented with in order to optimise
performance are the following:

   - Relying on the unique constraint violation to detect the presence of a
   key turns out to be faster than manually searching the table and inserting
   the key if it was not found.
   - I've also manually 'partitioned' the table into a configurable number
   of similar smaller tables in order to limit the search space.
   - Doing manual checkpointing, it appears the fixed time component spent
   on checkpointing is significantly higher than the time component related to
   the checkpoint interval, i.e. the amount of actual information in the WAL
   file to be consolidated.  In other words, with the processing (insert) rate
   remains constant, significantly more total processing time is lost (to
   checkpointing) if the checkpointing is performed after every transaction
   (20,000 inserts; +/- 2s per transaction), rather than every 5 to 10 minutes.
   - The database now operates in WAL mode; and
   - I pre-allocate a contiguous memory buffer at start-up for the database
   cache.  I also constantly monitor the cache usage to make sure the cache
   size is sufficient.

While on the topic of performance, I've performed a test in which new tables
were created within an active transaction (manual 'partitioning') and
populated within the same transaction.  What I've found, was that as the
number of tables created within the transaction increased, the performance
of the inserts within the same transaction decreased significantly.  I
haven't spent any further time trying to analyze the cause for this apparent
behaviour.

There may be 'better' alternatives than SQLite to implement the application,
but for a number of reasons I've decided to stick with SQLite and it seems
quite capable.  It is now just a matter of squeezing as much performance out
of it as possible.

Best regards,
Jaco
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite disc and I/O usage

Reply via email to