Felix Schwarz wrote:
Hi,
I have to decide on a table layout for storing the data of my
upcoming project.
Yesterday I have made my way through the excellent presentation at
http://www.sqlite.org/php2004/page-036.html and read the sentence
"Put small and frequently used columns early in the table to minimize
the need to follow the overflow chain."
Now, that's interesting! And I'm wondering whether there is a big
performance hit for a simple
SELECT binarydata FROM entries WHERE somehash = 27817298;
when I use
CREATE TABLE entries(
entry_id INTEGER PRIMARY KEY,
somehash INTEGER,
property1 INTEGER,
property2 VARCHAR(255),
property3 VARCHAR(255),
binarydata BLOB
);
instead of splitting the binary data (around 40K each) into two
tables like this:
CREATE TABLE entries(
entry_id INTEGER PRIMARY KEY,
somehash INTEGER,
property1 INTEGER,
property2 VARCHAR(255),
property3 VARCHAR(255),
binary_id INTEGER
);
CREATE TABLE binaries(
binary_id INTEGER PRIMARY KEY,
binarydata BLOB
);
and then use a select of this form:
SELECT binarydata FROM binaries WHERE binary_id = (SELECT
binary_id FROM entries WHERE somehash = 27817298);
Also, could the usage of VIEWs speed up the SELECTing of data in the
second example? Or does it just use SELECTs under the hood itself,
i.e. without any caching of data?
Thanks in advance for any feedback.
Felix
Felix,
I don't think you will see much difference between the two layouts. You
have your search field, somehash, located early in your records so
SQLite does not need to follow the overflow chain to read its value.
The split tables should actually take a little longer because after
SQLite has found the correct hash value, it must locate the binary in
the binaries table using the binary_id from the entries table. This
operation will be relatively fast, but is not gaining you anything.
The hint from the web page is pointing out the added time it takes to
scan the overflow chain to find the value of a field located after a
large field in a record. Your 40K binary fields will be split into
multiple overflow blocks in either table.
To speed this query you would be much better off adding an index on the
somehash field. Then SQLite could use the index to quickly locate the
correct record in your entries table. It will only scan the overflow
chain for queries that return the value of the binarydata field.
HTH
Dennis Cote