Felix Schwarz <[EMAIL PROTECTED]> wrote:
> 
> I'm wondering whether there is a big  
> performance hit for a simple
> 
>       SELECT binarydata FROM entries WHERE somehash = 27817298;
> 
> when I use
> 
>       CREATE TABLE entries(
>               entry_id INTEGER PRIMARY KEY,
>               somehash  INTEGER,
>               property1 INTEGER,
>               property2  VARCHAR(255),
>               property3  VARCHAR(255),
>               binarydata BLOB
>       );
> 
> instead of splitting the binary data (around 40K each) into two  
> tables like this:
> 
>       CREATE TABLE entries(
>               entry_id INTEGER PRIMARY KEY,
>               somehash  INTEGER,
>               property1 INTEGER,
>               property2  VARCHAR(255),
>               property3  VARCHAR(255),
>               binary_id INTEGER
>       );
> 
>       CREATE TABLE binaries(
>               binary_id INTEGER PRIMARY KEY,
>               binarydata BLOB
>       );
> 

If you do things like this:

   UPDATE entries SET property1=property1+1;

In other words, if you make changes that do not
alter the blob, then the second form can be dramatically
faster since the blob never has to be copied.  The whole blob
has to be copied several times to do this update in the first
form.  In order to do an update, SQLite has to load the entire
row into memory, decode it, substitute the changed value(s),
reencode the row, then write the entire row back out to disk.
So the entire row has to be read and written in order to change
a single byte.  (This is because SQLite uses variable-length
encodings for everything, including integers, so changing a
single byte can change all subsequent bytes in the row.)  So
if you are going to be updating things, it is best to keep
the rows relatively small by moving large blobs out to a 
separate table.

If you do things like this:

   SELECT sum(property1) FROM entries;

Then the second form is slightly faster because the entries table
will be smaller and you will get better locality of reference.
SQLite does not read the blob if it is not used here, but it does
have to skip over it.
--
D. Richard Hipp   <[EMAIL PROTECTED]>

Reply via email to