Re: [sqlite] Disk caching impacts performance.

John Stanton Fri, 09 Nov 2007 05:37:10 -0800

Renaud HUILLET wrote:


Thanks for your reply,

Indeed, the windows API is not the same at the Unix one (mmap), but I think I 
have a wrapper somewhere that can handle both.

Anyone has been trying the mmap for SQLite ?

Renaud

Date: Thu, 8 Nov 2007 10:15:24 -0500
From: [EMAIL PROTECTED]
To: [email protected]
Subject: Re[2]: [sqlite] Disk caching impacts performance.

Hello Renaud,

Thursday, November 8, 2007, 9:11:41 AM, you wrote:

RH> Is the mmap option part of sqlite or do we need to change the sqlite code ?

RH> If so, has somebody here already tried it ? Any source available ?

RH> What 's the order of improvement ?

RH> Renaud

Date: Thu, 8 Nov 2007 15:54:35 +0500
From: [EMAIL PROTECTED]
To: [email protected]
Subject: Re: [sqlite] Disk caching impacts performance.

Try this, use mmap
(i assume u'r db is readonly)this is much faster and
better than ifstream read. Also ifstream read can keep the data in cache as 
long as no other serious
i/o occurs.

U need to accept it as we work with Virtual Mem or
write your own FileSystem which is mem based and short circuits os calls. 
Sqlite 3.5x has good support for such ext.

regrads
ragha

******************************************************************************************
This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
*****************************************************************************************

----- Original Message -----
From: Julien Renggli
Date: Thursday, November 8, 2007 4:15 pm
Subject: [sqlite] Disk caching impacts performance.

Hello,



I'm currently working on a project which is very likely to use
SQLite as
data storage. Since performance is important for us, I already
found on
the SQLite website many ways to optimise the code (always working in
transactions where possible, using a page size of 4096 since it's
running on the Windows platform, using integers primary keys, ...).
ButI have one problem that I "solved" in an unorthodox way; it
works, but
maybe you have a better solution than mine?



I've been doing some test with a ~100 MB database, in which I have
three
tables: one for structured data (Objects, 2000 entries), one for the
blobs we have to store (ObjectBlobs ID INTEGER PRIMARY KEY, Data BLOB)
(8000 entries), and one which binds the structured data and the blobs
(8000 entries). As you can imagine, each Object has 4 blobs linked to
it; the blobs can be quite large (let's say up to 1 MB).

My (C++) application just has to read the table "Objects", and one of
the Blobs for each Object.



Now the first time I run my application, it takes quite a long time

(30s) to load the blobs. But if I re-run the app, it only takes 1s to
load them. It's clearly a disk caching issue: if I copy huge files to
the disk between two runs, it takes again 30s to load the blobs (i.e.

the DB is no more in the disk cache). Profiling the application
indicates sqlite::winRead() is the bottleneck.



I then had the following idea: SQLite is probably reading the file
randomly, depending on where the data lies. If I can force the DB
to be
cached, everything should be fine. So before connecting the
database, I
first read it sequentially (using a C++ ifstream) until the end of
file.
It perfectly solves the database problem, even though I still
notice a
difference (3s to read the file on 1st run, 0.2s later). But 3s is OK
where 30s was worrying me.



I hope I explain the situation clear enough, and ask you now: is it
theonly way to do it? I find the trick a bit nasty and don't like
it; maybe
I missed something? Before you ask: I tried to VACUUM the DB, it only
reduced the timings to 16s, which was still bad for our requirements.

Tests with a larger DB (it can get much bigger than my example) and on
different machines tend to confirm my theory.



Thanks in advance (and a big thank for SQLite which is really nice and
easy to use !),



Julien Renggli



P.S.: Some technical informations:

- sqlite v.3.3.16 (we will upgrade to the latest version later), C API

- Windows XP SP2

- Timings on Pentium 4 3.4GHz, 2GB RAM

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

RH> _________________________________________________________________
RH> Explore the seven wonders of the world
RH> http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
RH> 
-----------------------------------------------------------------------------
RH> To unsubscribe, send email to [EMAIL PROTECTED]
RH> 
-----------------------------------------------------------------------------

"CreateFileMapping" is I believe the function in windows. You have to
"CreateFile" to open the file, then map it. Then you can read forward
in it like a pointer to a block of memory. I actually have a "Preload"
function I use sometimes which reads the first 5-50 megs of the file
to pre-cache it.

--
Best regards,
Teg mailto:[EMAIL PROTECTED]


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

For a number of reasons Sqlite does not mmap files. The suggestion ofusing mmap was as a lower overhead method of priming the OS cache. Itremoves a level of buffer shadowing.



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Disk caching impacts performance.

Reply via email to