Felix Schwarz <[EMAIL PROTECTED]> wrote: > > I'm wondering whether there is a big > performance hit for a simple > > SELECT binarydata FROM entries WHERE somehash = 27817298; > > when I use > > CREATE TABLE entries( > entry_id INTEGER PRIMARY KEY, > somehash INTEGER, > property1 INTEGER, > property2 VARCHAR(255), > property3 VARCHAR(255), > binarydata BLOB > ); > > instead of splitting the binary data (around 40K each) into two > tables like this: > > CREATE TABLE entries( > entry_id INTEGER PRIMARY KEY, > somehash INTEGER, > property1 INTEGER, > property2 VARCHAR(255), > property3 VARCHAR(255), > binary_id INTEGER > ); > > CREATE TABLE binaries( > binary_id INTEGER PRIMARY KEY, > binarydata BLOB > ); >
If you do things like this: UPDATE entries SET property1=property1+1; In other words, if you make changes that do not alter the blob, then the second form can be dramatically faster since the blob never has to be copied. The whole blob has to be copied several times to do this update in the first form. In order to do an update, SQLite has to load the entire row into memory, decode it, substitute the changed value(s), reencode the row, then write the entire row back out to disk. So the entire row has to be read and written in order to change a single byte. (This is because SQLite uses variable-length encodings for everything, including integers, so changing a single byte can change all subsequent bytes in the row.) So if you are going to be updating things, it is best to keep the rows relatively small by moving large blobs out to a separate table. If you do things like this: SELECT sum(property1) FROM entries; Then the second form is slightly faster because the entries table will be smaller and you will get better locality of reference. SQLite does not read the blob if it is not used here, but it does have to skip over it. -- D. Richard Hipp <[EMAIL PROTECTED]>