Re[2]: [sqlite] Disk caching impacts performance.

Teg Thu, 08 Nov 2007 07:17:30 -0800

Hello Renaud,

Thursday, November 8, 2007, 9:11:41 AM, you wrote:



RH> Is the mmap option part of sqlite or do we need to change the sqlite code ?


RH> If so, has somebody here already tried it ? Any source available ?


RH> What 's the order of improvement ?





RH> Renaud

>> Date: Thu, 8 Nov 2007 15:54:35 +0500
>> From: [EMAIL PROTECTED]
>> To: [email protected]
>> Subject: Re: [sqlite] Disk caching impacts performance.
>>
>> Try this, use mmap
>> (i assume u'r db is readonly)this is much faster and
>> better than ifstream read. Also ifstream read can keep the data in cache as 
>> long as no other serious
>> i/o occurs.
>>
>> U need to accept it as we work with Virtual Mem or
>> write your own FileSystem which is mem based and short circuits os calls. 
>> Sqlite 3.5x has good support for such ext.
>>
>> regrads
>> ragha
>>
>> ******************************************************************************************
>> This email and its attachments contain confidential information from HUAWEI, 
>> which is intended only for the person or entity whose address is listed 
>> above. Any use of the information contained herein in any way (including, 
>> but not limited to, total or partial disclosure, reproduction, or 
>> dissemination) by persons other than the intended recipient(s) is 
>> prohibited. If you receive this e-mail in error, please notify the sender by 
>> phone or email immediately and delete it!
>> *****************************************************************************************
>>
>> ----- Original Message -----
>> From: Julien Renggli 
>> Date: Thursday, November 8, 2007 4:15 pm
>> Subject: [sqlite] Disk caching impacts performance.
>>
>>> Hello,
>>>
>>>
>>>
>>> I'm currently working on a project which is very likely to use
>>> SQLite as
>>> data storage. Since performance is important for us, I already
>>> found on
>>> the SQLite website many ways to optimise the code (always working in
>>> transactions where possible, using a page size of 4096 since it's
>>> running on the Windows platform, using integers primary keys, ...).
>>> ButI have one problem that I "solved" in an unorthodox way; it
>>> works, but
>>> maybe you have a better solution than mine?
>>>
>>>
>>>
>>> I've been doing some test with a ~100 MB database, in which I have
>>> three
>>> tables: one for structured data (Objects, 2000 entries), one for the
>>> blobs we have to store (ObjectBlobs ID INTEGER PRIMARY KEY, Data BLOB)
>>> (8000 entries), and one which binds the structured data and the blobs
>>> (8000 entries). As you can imagine, each Object has 4 blobs linked to
>>> it; the blobs can be quite large (let's say up to 1 MB).
>>>
>>> My (C++) application just has to read the table "Objects", and one of
>>> the Blobs for each Object.
>>>
>>>
>>>
>>> Now the first time I run my application, it takes quite a long time
>>>
>>> (30s) to load the blobs. But if I re-run the app, it only takes 1s to
>>> load them. It's clearly a disk caching issue: if I copy huge files to
>>> the disk between two runs, it takes again 30s to load the blobs (i.e.
>>>
>>> the DB is no more in the disk cache). Profiling the application
>>> indicates sqlite::winRead() is the bottleneck.
>>>
>>>
>>>
>>> I then had the following idea: SQLite is probably reading the file
>>> randomly, depending on where the data lies. If I can force the DB
>>> to be
>>> cached, everything should be fine. So before connecting the
>>> database, I
>>> first read it sequentially (using a C++ ifstream) until the end of
>>> file.
>>> It perfectly solves the database problem, even though I still
>>> notice a
>>> difference (3s to read the file on 1st run, 0.2s later). But 3s is OK
>>> where 30s was worrying me.
>>>
>>>
>>>
>>> I hope I explain the situation clear enough, and ask you now: is it
>>> theonly way to do it? I find the trick a bit nasty and don't like
>>> it; maybe
>>> I missed something? Before you ask: I tried to VACUUM the DB, it only
>>> reduced the timings to 16s, which was still bad for our requirements.
>>>
>>> Tests with a larger DB (it can get much bigger than my example) and on
>>> different machines tend to confirm my theory.
>>>
>>>
>>>
>>> Thanks in advance (and a big thank for SQLite which is really nice and
>>> easy to use !),
>>>
>>>
>>>
>>> Julien Renggli
>>>
>>>
>>>
>>> P.S.: Some technical informations:
>>>
>>> - sqlite v.3.3.16 (we will upgrade to the latest version later), C API
>>>
>>> - Windows XP SP2
>>>
>>> - Timings on Pentium 4 3.4GHz, 2GB RAM
>>>
>>>
>>>
>>>
>>
>> -----------------------------------------------------------------------------
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> -----------------------------------------------------------------------------
>>

RH> _________________________________________________________________
RH> Explore the seven wonders of the world
RH> http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
RH> 
-----------------------------------------------------------------------------
RH> To unsubscribe, send email to [EMAIL PROTECTED]
RH> 
-----------------------------------------------------------------------------

"CreateFileMapping" is I believe the function in windows. You have to
"CreateFile" to open the file, then map it. Then you can read forward
in it like a pointer to a block of memory. I actually have a "Preload"
function I use sometimes which reads the first 5-50 megs of the file
to pre-cache it.

-- 
Best regards,
 Teg                            mailto:[EMAIL PROTECTED]


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re[2]: [sqlite] Disk caching impacts performance.

Reply via email to