Re: [sqlite] Effect of blobs on performance

P Kishor Thu, 22 Feb 2007 09:19:20 -0800

see below for a counter perspective

On 2/22/07, John Stanton <[EMAIL PROTECTED]> wrote:

Thomas Fjellstrom wrote:
> On February 21, 2007, [EMAIL PROTECTED] wrote:
>
>>"Brett Keating" <[EMAIL PROTECTED]> wrote:
>>
>>>Hi,
>>>
>>>I'm curious about what the effect of having a blob in the database may
>>>be on performance. I have two design options: 1) put a small image file
>>>(15-30kbyte) into the database as a blob, and 2) store the image in a
>>>separate file on disk and hold the filename in the database. My table
>>>has around 20 rows in it, about half are strings/smaller blobs and half
>>>are integers.
>>>
>>>Option number one, for various reasons, is far more elegant and simple
>>>in terms of its impact on the rest of the code. However, I am concerned
>>>that holding such large amounts of data per record might impact
>>>performance. I could be worried about nothing though, which is why I'm
>>>writing to this list :).
>>
>>When I was designing the SQLite file format, I made the assumption
>>that BLOBs would be used infrequently and would not be very big.
>>The file format design is not optimized for storing BLOBs.  Indeed,
>>BLOBs are stored as a singly-linked list of database pages.  It
>>is hard to imagine a more inefficient design.
>>
>>Much to my surprise, people begin putting multi-megabyte BLOBs
>>in SQLite databases and reporting that performance really was not
>>an issue.  I have lately taken up this practice myself and routinely
>>uses SQLite database with BLOBs that are over 10MiB is size.  And
>>it all seems to work pretty well here on my Linux workstation.  I
>>have no explanation for why it works so well, but it does so I'm not
>>going to complain.
>>
>>If your images are only 30KiB, you should have no problems.
>>
>>Here's a hint though - make the BLOB columns the last column in
>>your tables.  Or even store the BLOBs in a separate table which
>>only has two columns: an integer primary key and the blob itself,
>>and then access the BLOB content using a join if you need to.
>>If you put various small integer fields after the BLOB, then
>>SQLite has to scan through the entire BLOB content (following
>>the linked list of disk pages) to get to the integer fields at
>>the end, and that definitely can slow you down.
>
>
> I still wonder about the utility of storing binary data in the db itself.
> Maybe it makes it more easy to distribute that way, but how often does one
> distribute an entire database in a "vendor specific" format?
>
> I'm quite interested in hearing people's reasoning for going the blob route,
> when you have a perfectly good "database" format for "blobs" already (various
> filesystems).


The BLOB method has two major advantages.  The first is that the data
all reside in one file.  The second is that it does not have a directory
size limit, a curse when storing files and one which requires extra
logic to build a tree structure of directories or some other way of
limiting directory size.

The downside is the relative slowness of retrieving large BLOBs,
although the anecdotal evidence is that this is not as much of a problem
as one would expect.

On balance I would use BLOBs for binary storage except in the case where
there is a predominance of very large files.
>


2 examples --

I had my website as a SQLite db. Since I am using a cheapo hosting
site, I was at the mercy of what they had. At some point, I update my
local version (on my laptop) to the then latest version of SQLite.
That was one of those inflection points at which the SQLite db format
was backward incompatible. The web host was not going to upgrade their
DBD::SQLite, and it was a lot of extra work for me to create my own
libs and use my my own DBD installs. I changed the backend to plain
text files that are stored in a file folder hierarchy like so
?/??/???/file (where ? are the first, the first two, and the first
three letters of the file name). The beauty of this approach is that I
can log in to my website remotely and just use vim to change
individual pages without having to do that from SQLite. I still use
SQLite for metadata.

Second example is for a destop application -- I was looking for an
application in which I could write disjointed thoughts and research
findings. One of the applications (I forget its name), really nice
app, used Coredata on Mac OS X Tiger as its storage. Yes, I could get
to the data, but if I ever stopped using that app, it would be a pain
to get my data out. I changed to another program (Scrivener) which
stores my documents as separate RTF files inside an OS X package. From
the outside, the package looks like a file, but right click on it, and
you can peek inside to find all the separate RTF documents. I can do
what I want to with those documents with TextEdit.

The above examples are not against the use of BLOBs. They are just
examples of where NOT storing BLOBs in a db is a strength. I know that
SQLite is very stable, and a lot of care has been taken to make it
corruption proof, nevertheless, putting all the eggs in one basket is
a scary thought. I'd much rather take my chances with possibly
corrupting a single file than corrupting the entire house.

As far as I understand, filesystems have evolved over a long period of
time, and are very quick at retrieving a file. I can use SQLite for
all the upstream processing up until the point where SQLite tells me
the name and the path to the file. Then I simply ask the filesystem to
give that file to me.

Use the right tool.

--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
---------------------------------------------------------------------
collaborate, communicate, compete
=====================================================================

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Effect of blobs on performance

Reply via email to