Re: [sqlite] sqlite performance with sizeable tables

Christian Smith Mon, 13 Mar 2006 06:42:35 -0800

On Fri, 10 Mar 2006 [EMAIL PROTECTED] wrote:

>----- Original Message ----
>> From: Christian Smith <[EMAIL PROTECTED]>
>
>> But SQLite depends on the OS caching abilities for much of it's
>> performance. Removing it is like saying SQLite is rubbish on Intel
>> processors after testing on a i486.
>
>yes and no: while it's nice to be able to rely somehow on the OS cache,
>the OS can not guess what the usage pattern of the app is going to be.
>Most of the time, the os will guess wrong too: a typical case is random
>access of a file, the os does readahead for example that is not
>beneficial in that case (and actually can slow things down). I always try
>to give the best hints possible to the underlying subsystem (like never
>rely 100% on the optimizer to know what to do with a bunch of code,
>because it won't).
>
>When dealing with big databases (that don't typically fit in memory), the
>OS will most likely cache the wrong kind of data, where as the lib could
>cache important information such as some key offsets in the file (the
>cache would contain some elements from the btree so that we can resume
>directly from there).



When your database does not fit in memory, yes, you're right, the OS may
well get caching wrong, and in the worst way possible. Two things though:
- SQLite does have known scaling issues when using multi-gigabyte
  databases.
- Memory is cheap. If it don't fit, spend that few hundred dollars a few
  days of your time is worth and buy another few gigs or RAM.


[snip]
>> General observations/questions:
>> - What did you expect from SQLite? Can you give indication of performance
>>   expectations?
>> - What did you use previously? How does SQLite compare on performance?
>I was expecting much better performance that what I am getting right now:
>before I was using the filesystem as a DB (basically, folders +
>filenames->data) Namely reiserfs performs pretty well for doing this kind
>of thing (millions of record). But that was wasting a lot of disk space
>and hitting all sorts of limits in the various OSes.
>
>Tried BDB a few years ago as a replacement, but "wedged" databases and
>licensing issues kept us away from it. then I red about sqlite giving
>performance of the order of 25000 inserts/second. I thought that I could
>probably get around 10000 inserts/second on bigger databases.


Do you need the high insert rate sustained?


>
>Right now, sqlite shows performance that is on par with a simple
>filesystem structure (except it's much faster to backup because
>traversing a multimilion file structure takes several hours). I was
>expecting a few things by moving to sqlite:
>* getting a better ramp up (cold cache) performance than a dump
>  filesystem structure.


Nothing will speed up the physical disks.


>* having a [much] higher throughput (so that I can combine multiple
>  servers into one), as the records are small and there is no
>  fragmentation of the folders here.
>* having consistant latencies (filesystems tend to vary a lot).
>
>> - Have you tested other databases? What sort of performance did you get
>  from those?
>I am in the process of setting up mysql with our app, I will keep you
>posted on the result.


Prepare to be disappointed, IMO. The most important thing when dealing
with big fast databases is disk IO, and anything (MySQL, PGSQL, Oracle,
SQLite) will find disks a bottleneck.


>
>> - If you're not batching, or can't batch updates, then SQLite may very
>  well not be performant enough due to inherent restrictions of it's
>  architecture.
>You are not the first person to mention batching, what does that mean
>exactly? I have been using inserts in transactions, isn't that it?


Yes. Transactions have a fixed synchronous IO overhead, that can be
amortized over a large number of inserts/updates.


>
>> - An idea of what your app does in general would be of assistance in
>>   helping you find your bottleneck. A specific test case is good if this
>>   is indeed the issue, but it doesn't sound like it is.
>I am focusing right now on the time it takes to run a simple import from
>foreign format kind of thing: One thread only, doing simple selects
>(retrieve users profiles), and replace (to write back the same profiles).


Again, are imports the main stay of the application? if not, then is it
worth optimising for? What does your application generally do most of the
time?


> To accomodate a lot of users, I have one main DB that holds all users ->
> unique ID The other dbs are a partitioned db really, so that later
> threads conflict only when accessing users within the same range and the
> dbs don't get out of proportion (because blobs are used I thought a lot
> of records could become a performance bottleneck). If I precache the big
> table name -> uid, the import of 650,000 records takes about 2 minutes.
>If I don't precache it, it takes about 8 minutes. I start/stop
>transactions every X seconds (right now, I have it set at 10 seconds).


Are these name->uid lookups used in a read-only fashion after import? If
so, then multiple threads can indeed read the table in parallel, and you
should have good performance.

For now, I'd say:
- If the import/update case is predominant, look at another DB. I'd
  suggest PostgreSQL, as it's MVCC lockless row protocol allows concurrent
  updates to disjoint sets of rows in the same table.
- If read-only access is the predominant use, SQLite may well fit the
  bill, and take the performance hit when importing data.
- I'd personally avoid MySQL. It's performance benefits over PostgreSQL
  are mostly overstated, and PGSQL has a more robust architecture IMO.


>
>Cheers
>
>Nicolas
>

Christian


-- 
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \

Re: [sqlite] sqlite performance with sizeable tables

Reply via email to