On Tue, 7 Mar 2006 [EMAIL PROTECTED] wrote:

>Hi all
>
>it seems that I am running in a problem with the way sqlite accesses the
>disk when inserting rows of data in databases that are large in number of
>records but not necessary big on disk (I am talking millions of records
>in files that are in the order of a few hundred MBytes).
>
>I reduced to this test case:
> [snip test case]
>
>IMPORTANT: I umount/mount the partition between tests to clear the disk
>caches from the OS.



Is this a likely usage scenario? Will your application regularly
umount/mount the filesystem between transactions? While sounding
facetious, I'm not trying to. Your otherwise excellent example is let down
by a probably unreasonable usage pattern.



>
>Test 1:
>>From an empty DB.
>I loop so that the uids are consecutive numbers from 1 to 1 million. at
>this point, each transaction takes less than 1 second to execute. The
>whole 1M inserts (100 transactions) take 74 seconds ie 13000
>inserts/second.
>
>Test 2:
>With the DB from Test 1, I run the exact same sequence of inserts.
>the 1M inserts take 103 seconds to execute that's still 9700 inserts/second.
>First transaction is 8 seconds then about 1 second.
>At this point everything is OK.
>
>Test 3:
>from the DB from Test 2, I run 1 million inserts where the uid is
>selected randomly between 1 and 1 million. At this point, the performance
>is pretty bad: the first 10000 insert transaction takes 31 seconds to
>run, the next ones take over 5 seconds, for a total run time of 623
>seconds. That's 1600 inserts/second (6 times slower than the ordered
>case).


To be expected with no OS cache.


>
>It seems that the performance degrades pretty badly with the number of
>records: this is still a relatively small dataset (especially given the
>simplicity of the table). To me, it looks like there is a problem in the
>way the files are accessed.
>
>I tried to partition the dataset by creating separate databases or
>tables, but creating separate databases make things slower (expected, as
>the problem is disk I/O seeks probably), and partitioning tables give me
>only a 20% speed gain on those 1M insert tests.
>
>Things get really slow afterwards, for example moving to 2M records (137 MB on 
>disk):
>Test 1 w/ 2M is 138 seconds, about 2 times slower than the 1M case (normal)
>Test 2 w/ 2M is 192 seconds, 2 times slower (normal).
>Test 3 w/ 2M is 2390 seconds 4 times slower than in the 1M case (12 times 
>slower than the ordered case).
>
>I didn't try these tests with bigger sets (my original app was on a DB of
>about 8M records, and things were very very slow).
>
>Given the size of the DB on disk, I would think that the speed could be
>much more consistant than that (especially considering that it should
>cache a lot of the information quickly).


The pages containing nodes closer to the root of the affected btrees will
be cached pretty quickly, but the random access pattern and constant
random updates will make for very non-localised updates to the DB file,
which would require lots of seeks on sync.

You might be well advised to look at how long it takes to do the updates,
then the COMMIT seperately. I'd hazard a guess that you'll be spending the
majority of the time in the COMMIT commands. Something like:

do_updates()
{
        start_timer
        for(i=0; i<10000; i++) {
                REPLACE ...
        }
        check_timer
        COMMIT // I reckon majority of time spent here
        stop_timer
}

>
>Any idea?
>
>Nicolas
>


Christian


-- 
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \

Reply via email to